1 Introduction

Banks that are active play an important part in the economy of the country by providing the necessary financing for projects and companies of all kinds. This helps to contribute to the creation of an effective economic movement, despite the many challenges that are faced by the banking system, which influence its growth and development. It is essential for financial institutions that want to succeed in achieving their objectives to measure the efficiency of their banking operations [1, 2]. The main challenge is due to the existence of a large number of financially weakened banks within the banking sector. This situation leads to the inefficient allocation of existing resources and impedes the growth of the banking sector as well as the overall economy. The aforementioned phenomenon leads to an inefficient and unstable banking system, thereby leading to economic stagnation [3].

Financial institutions play an essential part in the process of generating economic growth by facilitating the mobilization of financial savings, putting those savings to productive use, and changing a variety of risks. Strengthening of emerging markets and developing economies financial institutions has been one of the most significant challenges faced by these two types of economies. The performance of the banking sector is essential at both the macro- and the micro-levels because the banking sector plays a critical role in the distribution of the financial resources available to the economy. Additionally, having an effective banking system contributes to an increase in the efficiency of the government’s macroeconomic policies [2, 4].

The objectives pursued by banks, such as profit maximization, liquidity maintenance, and risk minimization, are inherently conflicting, rendering the banking function a highly significant and demanding undertaking [3]. Since decreasing the average cost of financial transactions promotes overall societal welfare, bank efficiency is a socially optimal goal and is therefore a crucial subject to cover [5]. The purpose of conducting an efficiency review of these banks is to track the occurrence of any deviations or obstacles, after which they will work to address and surmount any weaknesses that have been identified [3]. Therefore, it is essential to carry out a comprehensive examination into the peculiar characteristics of the Egyptian banking sector.

Recently, may approach utilize the data envelopment analysis (DEA) technique and a machine learning algorithm for assessing the efficiency of commercial banking institutions. The performance of banks has depended on traditional parametric methodologies, such as financial ratios. However, these methods have been viewed by many scholars for being insufficient and unable to meet the expectations of bank managers. The popularity of nonparametric methods, such as DEA, has significantly increased in the examination of banks efficiency [6, 7]. DEA is a mathematical methodology that employs linear programming techniques to assess the efficiency of many subjects (Banks) based on a wide variety of inputs and outputs [6, 8]. The subjects refer to the decision-making units (DMUs) examined by DEA models [9].

DMUs are to be rated as efficient based on the evidence that is now available if and only if the performances of other DMUs do not show that some of its inputs or outputs may be improved without deteriorating some of its other inputs or outputs [9]. There are many DEA models, but the two that are most frequently used are the CCR model (after Charnes, Cooper, Rhodes, 1978) and the BCC model (after Banker, Charnes, and Cooper, 1984). The two models approach to returns to scale are fundamentally different from one another. While CCR assumes that each DMU works with constant returns to scale (CRS), BCC allows variable returns to scale (VRS) [10, 11]. CRS means that the rate of rise in output is equal to the rate of increase in inputs and VRS means that the increasing rate of production is different from inputs where the focus moves to pure technical efficiency [6].

The DEA method is utilized to assess the efficiency of each DMU within the given sample by comparing it to the best practice observed within the same sample. The DEA technique is utilized as a means of identifying inefficiencies within a specific DMU by comparing it to other DMUs that are considered to be efficient [2]. The chief advantage of DEA is that it does not require any assumptions about inefficiency distribution or a specific functional form on the data to find the most efficient DMU [2, 3]. On the other hand, the fundamental premise of the approach is that random errors do not occur and that any deviations from the frontier show inefficiency; the main disadvantage of DEA is that the frontier is sensitive to extreme observations and measurement errors. Another disadvantage of using DEA is rerunning the DEA model to calculate the efficiency score for a new DMU would use a lot of memory and CPU time on the computer system, especially now, with the fast development of big data [10, 12, 13].

As a result, an approach that uses machine learning (ML) techniques to forecast the efficiency score should be used [12]. With the assistance of ML approaches such as support vector machine (SVM), K-nearest neighbors (KNN), random forest (RF), and AdaBoost classifier (ADA), the parameters of a great number of nonparametric and nonlinear problems have been successfully calculated. For instance, both DEA and machine learning algorithms make assumptions regarding the functional form of the link that connects the inputs and outputs of their respective systems [14]. The efficiency of a bank’s can also be evaluated comprehensively based on a variety of performance metrics employing many financial factors. This suggests that the relationship between the efficiency of the bank and the many variables is one that is highly complex and not easy to understand [15]. This suggests that there are good opportunities for the integration of DEA and machine learning models inside the banking sector [16].

Enhancing prediction accuracy and developing models that generate more precise outcomes are among the most significant challenges confronting machine learning researchers and the achievement of this objective may be facilitated through the augmentation of the quantity of data utilized for model training, the adoption of larger architectures, and the provision of additional computer resources [17]. In the field of evaluating banking efficiency, scholars have recently shown disregard for ensemble-based approaches, despite their generally higher reliability, better performance, and lower computing power [18]. Using a variety of machine learning models has been suggested as a means of performing an efficiency analysis on financial institutions [18].

It is well-known that every ML method has both disadvantages, such as error, overfitting, bias, prediction accuracy, and even robustness. Better robustness and predictions are the fundamental benefits of employing ensemble learning methods, with the major aims of using ensembles being the reduction of prediction variance, bias, and/or improvement of performance. Predictions and model stability can be unpredictable with many ML methods. Therefore, ensemble learning techniques produce more reliable predictions than an individual model, and they also exhibit superior predictive ability. Although ML techniques have many benefits, the variance and bias problem may not be adequately addressed by a solitary model or an ensemble method. Consequently, a more effective method, such as stack ensemble, becomes preferable in dealing with the issue [18, 19].

Ensemble learning methods can be categorized into three different classes, namely bagging, boosting, and stacking. The ensemble-based approach known as “bagging,” also called “Bootstrap aggregation,” creates many samples of the training dataset, each of which is used to train a different classification model. It is well-known that bagging primarily reduces variance rather than bias. However, it may not work as well with simpler models [20, 21].The technique of boosting involves the creation of an ensemble through merging performing learners, with the aim of leveraging subsequent models to fix errors produced by preceding models. On the other hand, the objective of boosting is to iteratively merge weak learners in order to minimize both bias and variance. It is imperative to acknowledge that the boosting technique is susceptible to the influence of noisy data and outliers, which might potentially lead to the problem of overfitting [20, 22].

Finally, stacking is considered to be one of the effective methods for aggregating numerous learners and combining multiple models. In the stacking approach, a learning algorithm uses the outcomes of multiple other algorithms to generate predictions for the accurate values inside the test set. Stacking is a methodology employed to reduce both variance and bias by addressing errors produced by base learners. The aforementioned objective is achieved through the process of constructing one or more meta-models based on the predictions generated by the base learners. The stacking approach typically involves a set of base learners at level 0, along with a meta-learner at level 1 [2, 18, 22].

The super learner (SL) approach is an extension of the stacking technique, which generates an ensemble based on cross-validation. The SL is the result of a weighted combination of many candidate learners that were generated by various techniques [23,24,25]; this methodology has been investigated by theoretical examination and supported by academic research. The SL exhibits the capability to outperform the constituent algorithms utilized in its development through the minimization of a cross-validation loss function [23].

The SL algorithm applies a group of potential prediction algorithms the base learners to the underlying dataset in the context of prediction. The SL theory does not mandate a particular degree of variety among the collection of base learners; instead, the base learners might be any parametric or nonparametric supervised machine learning technique. The SL then operates as a “Meta-learner” to combine the base learners in the best way possible. Typically, the meta-learner algorithm is a technique created to reduce the cross-validated risk of a certain loss function. It is advisable to utilize a meta-learning approach like the super learner, which works well in the presence of collinear predictions, because the set of predictions from the many base learners may be strongly linked [26].

The fundamental benefit of a SL is that it will select the “best” model (as determined by a loss function) for prediction across all individual learners in a cross-validated ensemble. The knowledge that the best unbiased estimator has been chosen helps allay worries about conditional relationships that have been erroneously described. The benefit of dealing with huge datasets or big data used to be also the biggest drawback for using advanced prediction algorithms like SL or other complex methods of machine learning due to the amount of computing resources required when large data sets with many predictors are used. Currently, computer power is less of an issue because to modern technologies and a robust open source software development community [26].

This paper presents a high-performance ensemble learning approach that can efficiently address the error rate of the DEA model and the limitations of other ensemble learning methods when utilizing limited financial data as input for the proposed model. The objectives of this work clarify on the following points:

  • The super learner has an ensemble learning technique been used for predicting bank efficiency using different input combinations to overcome the drawback of the DEA model. The technique uses the cross-validation theory with ten folds to get over the overfitting issue while working with data.

  • The DEA model and four machine learning models were compared to the suggested model. Our framework’s output was compared to competitors to assessing its ability to accurately evaluate the operational efficiency of banks with limited financial data.

2 Related work

The evaluation of banks efficiency is an essential and thoroughly studied topic that has received significant attention in this area of research. A variety of statistical and analytical approaches have been used to predict the evaluative efficiency of financial data [27]. The focus of the literature review in this study pertains to the prediction efficiency of banks.

Awoin et al. [28] used three decision tree techniques namely C5.0, C4.5, and CART in addition to a random forest technique to find the variables with the highest significance to the model, and the result showed that the C5.0 algorithm achieved a 100% accuracy rate, with the drawback being the size of the data; the larger the data, the better the accuracy. Shetu et al. [29] used seven different machine learning classification algorithms to forecast customer satisfaction and dis-satisfaction with online banking, namely logistic regression, random forest, naive Bayes, support vector machine, neural network, decision tree, and K-nearest neighbor algorithms, and the results showed that the random forest algorithm outperformed the other algorithms and was deemed the most effective in predicting customer satisfaction and frustration.

Anouze et al. [30] conducted a three-stage DEA framework analysis, using random forest as a tool to measure the relative importance of environmental variables in predicting bank performance, and discovered that the average relative performance of commercial banks in Middle East and North Africa (MENA) countries was 87%. The study’s main flaw is the small sample size and the lack of testing of the recommended framework for banks operating only in MENA countries. Mousa et al. [7] used a hybrid approach that combined data envelopment analysis (DEA) with the BCC-O model to estimate the degrees of DMU efficiency and artificial neural networks (ANN) to predict the best financial performance in terms of return on assets and return on equity for Egyptian exchange banks, where the empirical test results showed high correlations between the actual and expected values, with low prediction errors in both. They may also rely on a wider sample size of public, private, international, and joint banks to compare the banking sector’s financial performance forecast findings.

Shrivastava et al. [31] employed a distinctive approach known as synthetic minority oversampling technique (SMOTE) to convert unbalanced data into a balanced format. This was accomplished through the utilization of three distinct machine learning algorithms, namely lasso regression, bagging technique (random forest), and boosting technique (AdaBoost). In addition, the utilization of Lasso regression was employed to eliminate redundant features from the predictive model for bank failures. The findings indicated that lasso regression possesses the potential to enhance the efficacy of the AdaBoost algorithm by detecting features that are statistically significant. Omrani et al. [4] proposed the utilization of a bi-level multi-objective DEA (BLMO DEA) model, which can be solved through the application of a fuzzy fractional goal programming (FGP) approach. This methodology enables the simultaneous evaluation of two distinct efficiencies, ultimately leading to a more comprehensive assessment of the efficiency of individual bank branches.

Mirmozaffari et al. [14] have been utilized a combined optimization approach of DEA and machine learning clustering method to identify the most efficient DEA decision-making units (DMUs) and the optimal clustering algorithm, respectively, where the result showed that the (Banker, Charnes, and Cooper-Charnes, Cooper, and Rhodes) BCC-CCR model in DEA outperformed the machine learning clustering technique in terms of efficiency. Wanke et al. [32] employed the dynamic network DEA model has been used to deal with the underlying connections between significant accounting and financial variables, and the result indicated that bank type, origin, and ownership have different effects on efficiency levels in terms of profit, balance, and financial health indicators.

Mousa et al. [33] applied three different machine learning methodologies, namely linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and random forest (RF), which have been used to build predictive models for bank financial performance, and the results reveal that the RF technique gives the best predictive model for the variables in our study in terms of overall accuracy (86%), with the sample size is a weakness because it is extremely small. Fernandes et al. [8] utilized DEA and a double bootstrapped truncated regression to construct a Malmquist productivity index to create bank efficiency scores. The authors also employed a bias-correction technique to obtain more accurate scores and examined whether variations in financial conditions have differential impacts on bank efficiency levels, where the result showed the presence of a periphery efficiency meta-frontier. Empirical evidence suggests that the productivity of banks is adversely affected by liquidity and credit risk, while capital and profit risk have a favorable impact. However, the dataset’s time period does not include the years of relative financial stability in the periphery after 2014 because banking data are not readily available.

Zhu et al. [12] have been established a linkage between the DEA approach and four machine learning algorithms (back-propagation neural network [(BPNN-DEA), genetic algorithm (GA) integrated with a back-propagation neural network (GANN-DEA), support vector machines (SVM-DEA), and improved support vector machines (ISVM-DEA)] to assess and predict the DEA efficiency of DMUs, where the result showed that the GANN-DEA has the best performance rather than other algorithms. However, the limited sample size is a problem, the greater the data, the better the performance. There are also certain technical problems to consider, such as the activation function, cross-validation design, and regression model selection. Nti et al. [34] have been introduced a novel homogeneous ensemble classifier named GASVM. This classifier is based on support vector machine and is enhanced with genetic algorithm (GA) for feature selection and support vector machine (SVM) kernel parameter optimization. The purpose of this classifier is to predict the stock market, where the result showed that the GASVM outperforms other classifiers such as random forest (RF), decision tree (DT), and neural network (NN) with a prediction accuracy of 93.7% compared to 82.3%, 75.3%, and 80.1%, respectively. The study’s weak point is that it used GA for feature and parameter optimization based on past research findings, without testing with other optimization strategies and neglecting the impact of user emotion and web financial news on stock price movement.

On the other hand, previous studies have been used boosting which is sensitive to noisy data and outliers and may lead to overfitting, whereas bagging may not be as successful when used with relatively simple models. For the previous reasons, the super learner which is an extension of stacking has been used to assess banking efficiency. This technique has many benefits including the ability to construct ensembles utilizing several learning models at the same time while fixing errors caused by base learners and benefit on their specific advantages while avoiding their limitations. Additionally, it has been observed that stacking ensembles perform well in the appearance of outliers and noise [35].

Despite this, there is still debate regarding about whether or not these methods are the best way to combine the candidate learners, despite the fact that their performance is far better than that of a single learner or ensemble [36]. The aforementioned studies indicate that there is a need for researchers to enhance the predictive efficiency of banking methodologies, and that work in this field is currently in progress. The current study presents the super learner technique, which is an ensemble methodology that overcomes the DEA model’s constraints, which are that the frontier is sensitive to extreme observations and measurement errors, and other machine learning models in accurately estimating efficiency.

3 Material and methods

This section discusses the data source, the definition and measurement of variables, the proposed framework’s flowchart, the structure of the super learner approach, and the machine learning models used in this study.

3.1 Study area and data collection

The present experimental study has selected the banks that are currently operational in Egypt and are traded on the Egyptian exchange for research purposes. The study used a sample of ten Egyptian banks include (Credit Agricole Egypt [37], CIB [38], QNB [39], Arab bank [40], ADIB [41], Arab African International Bank [42], National Bank of Egypt [43], Banque Misr [44], Banque du Caire [45], and ALEXBANK [46]). The financial records of these banks were obtained for the purpose of this study through online access, utilizing both the banks official websites and the Egyptian exchange platform. The dataset was obtained from financial statements that pertained of publicly traded banks throughout a ten-year period from 2012 to 2021 and consisted of 110 records [47]. The study collected financial data from decision-making units (DMUs) using variable returns to scale (VRS) technology.

The data included Total IT expenditure (I) which represent banking IT resources such as automated teller machines (ATMs), and Internet resources are crucial tools that banks employ to collect deposits, total fixed assets (X) which represent the bank’s long-term tangible assets, such as plant and equipment, profit accrued from investing in securities (P) which represent net profit of the bank, percentage of performing loans (%L) which represent the percentage of total loan amount with the level of performing loans, and total deposit (D) which represent the funds raised from the bank’s clients, such as the total of savings, checking, and money market accounts [48].

The categorization of bank efficiencies was conducted based on two classes, namely Class A, which represents banks that exhibit high levels of efficiency, and class B, which represents banks that exhibit low levels of efficiency. Attaining complete efficiency in units or departments is unattainable, and this holds true for bank branches in Egypt as well. Regarding the evaluation of a system’s efficacy, systems that exhibit an efficiency score ranging from 80 to 100% are deemed proficient (efficient) in terms of their operational output. It is imperative for banks to consistently satisfy a predetermined threshold to attain optimal efficiency and competitiveness within the banking industry. The authors [38] of this study utilized a proposed “cut-off point” (DMU efficiency ≥ 0.8) from a previous source to determine banks that were highly efficient, with an efficiency value of 80% or greater.

In our case, the study utilized predictor variables including X, I, D, %L, and P. The response variable utilized in the predictive models consisted of overall efficiency scores of bank branches, which were classified as either efficient (class A) or inefficient (class B). Each data set was split into 80–20% divisions to create the training set and testing set, respectively. The training set divides using tenfold cross-validation.

3.2 Proposed model workflow

The sequential steps involved in implementing the proposed model for this study are depicted in the diagram in Fig. 1, which starts with the creation of the dataset and prediction of banks efficiency.

Fig. 1
figure 1

In the proposed workflow, financial data and a DEA model are merged to generate a dataset. Next, the dataset is split into two: the training set and the test set. While a super learner model is being trained on the training set, the suggested is being tested on the testing set with the help of evaluation model

3.3 Super learner model (SL)

SL is a cross-validation-based strategy to combining machine learning algorithms that yields predictions as least as excellent as the best input algorithm. In this section, we first examine the reason behind the super learner, then describe how it works, and lastly briefly mention some of the theoretical guarantees it provides.

The stacking algorithm acquired an alternative naming, “Super learner.” SL model is an ensemble model based on loss minimization proposed by Van der Laan et al. [23, 49]. The technique employed in this model is known as ensemble learning, which involves the combination of multiple individual learner models to attain superior performance [50]. Ensembles are utilized to stack base learners with the aim of minimizing errors in forecasting. Super learner outperforms the base learner in this regard, as it effectively identifies and outlines the systematic errors of the base learners in the final prediction [49].

The SL classifier consists of two layers, namely the base learners and the meta-learner [51]. Theoretical findings suggest that the SL model exhibits superior asymptotic performance compared to other candidate learners [20]. This is achieved through the learning of a meta-learner using the outcomes of multiple base learners. By means of cross-validation, it is possible to generate the level 1 data, which refers to the outcomes of the base learners [20]. One significant advantage of this cross-validation theory-based strategy is that it can overcome the overfitting issue that plagues most other ensemble approaches. Another advantage of the proposed super learner is its adaptability to varied data generating distributions across multiple studies.

In order to train a number of differentiated base classifiers, the super learner makes use of tenfold cross-validation. With the objective to minimize the weighted total loss of all base classifiers, the super learner obtains an ideal weighted combination of base classifiers that reduces the amount of classification loss caused by cross-validation [23]. Consequently, this methodology will help enhance the accuracy of the model’s predictions as well as its overall robustness [50]. The framework of SL model according to [52] that used in this study is illustrated in Fig. 2.. Additionally, the ML-ENS [49, 53] (http://ml-ensemble.com) module was used to create the Super Learner model.

Fig. 2
figure 2

The super learner’s architecture requires the data to be folded, with each fold then split into a training set and a verification set (V). Predictions are made by first training four different base learner models (SVM, KNN, RF, and ADA) on the training set and then evaluating those models on the verification set for each fold. The results of these forecasts are then utilized to populate a new dataset with both Z and V, which is used to train the meta-learner and, eventually, produce forecasts

The construction methodology of the SL model, as shown in the work of [23], can be concisely summarized as follows. The goal of analyzing a dataset using observations Di = (Xi, Yi), where i = 1, 2, … n, is to find the best default classifier model, Ψ0(X) = E (Y|X). The function can be expressed as the minimizer of the expected loss:

$$ \Psi_{0} \left( X \right) = \arg \min E\left[ {L\left( {D, \Psi \left( X \right) } \right)} \right] $$
(1)

where arg min () function return the minimum risk indices along an axis.

The SL method comprises a set of distinct principles, which are as follows:

1. Fit each algorithm in L on the entire dataset to estimate Ψk(X), k = 1, 2, …, K(n).

2. Divide the dataset D into two samples into two parts: a training set and a verification set, using a V-fold cross-validation scheme: divides the ordered n observations into V equal size groups, with the vth group serving as the validation sample and the remaining group serving as the training sample, v = 1, …, V. T(v) is the vth training data split, while V(v) is the equivalent validation data split.

$$ {\hat{\Psi }}_{k,T\left( v \right)} \left( {X_{i} } \right), D_{i} \in V_{{\left( {\text{v}} \right)}} , v = 1, ..., V $$
(2)

3. Stack the predictions of each algorithm to form a n by K matrix, Z =  \(\{ {\Psi }_{KT\left( V \right)} \left( {X_{{V\left( {\text{v}} \right)}} } \right),{ }v = 1, \ldots ,V{ }\& { }K = 1,{ } \ldots ,{ }K\}\), where XV(v) = (Xi Di \(\in\) V (v)) represents the observation data in the validation sample V(v).

4. Combining a collection of candidate base learners with a weight vector α to build a family of weighted combinations that can determine:

$$ m\left( {z{|}\alpha } \right) = \mathop \sum \limits_{k - 1}^{k} \alpha_{k} {\Psi }_{k,T\left( v \right)} \left( {X_{{V\left( {\text{v}} \right)}} } \right), \alpha_{k} \ge 0\;\forall \;k, \mathop \sum \limits_{k - 1}^{k} \alpha_{k} = 1) $$
(3)

5. Determine the α that minimizes the candidate estimator’s cross-validated risk K \(\sum\nolimits_{{k - 1}}^{k} {\alpha _{k} } \Psi _{k} \) overall allowable α combinations:

$$ \hat{\alpha } = \arg {\text{min}} \mathop \sum \limits_{i - 1}^{n} \left( {Y_{i} - m\left( {Z_{i} {|}\alpha } \right)} \right)^{2} $$
(4)

6. Combining the ideal weight vector \(\hat{\alpha }\) with \({\Psi }_{k} \left( X \right)\) in accordance with \(m\left( {z{|}\alpha } \right){ }\) of weighted combinations to create the final super learner, where

$$ {\Psi }_{{{\text{SL}}}} \left( X \right) = \mathop \sum \limits_{k - 1}^{k} \hat{\alpha }_{k} {\Psi }_{k} \left( X \right) $$
(5)

Table 1 Explains the parameters which have been contained in the mentioned formulas

3.4 Machine learning models (base learners)

It has been decided to use the machine learning algorithms K-nearest neighbors (KNN), support vector classifier (SVC), random forest (RF), and AdaBoost classifier (ADA) as base learners in the super learner’s model, where the Scikit-learn package [50] (https://scikit-learn.org) in Python 3.8 was used to implement the models that were employed in this study. The following is a description of the machine learning algorithms that were chosen:

3.4.1 K-nearest neighbors (KNN)

Today, the KNN approach is frequently employed in data mining models [54]. Due to its simplicity, ease of implementation, adaptability, and high performance, this technique can be effectively applied to address both classification and regression problems [54,55,56]. Although the KNN methodology offers numerous advantages, it also presents certain limitations. One limitation of the K-nearest neighbor classification approach is that the testing stage is associated with higher costs and slower processing times due to the need for significant memory resources to store the complete training dataset. Moreover, this methodology has been inadequate in tackling the problem of incomplete data [54]. The utilization of kd-trees can be leveraged to improve K-nearest neighbor (KNN) queries for large datasets [29].

The KNN technique operates by determining the “K” nearest samples from a pre-existing dataset [57]. When the K value is low, the algorithm’s understanding is diminished, and it becomes susceptible to overfitting. On the flip side, when the K value is high, the model becomes easier to understand [58]. The following and Fig. 3 show the summary of KNN [56]:

  1. 1.

    Find k’s value, as depicted in the diagram: k = 3.

  2. 2.

    Determine the distance between the blue point and each red point using the Euclidean distance.

  3. 3.

    Based on k = 3, the two dots with blue color inside the circle and one dots with red color represent the three nearest neighbors.

  4. 4.

    By averaging the three results, the anticipated value may be calculated.

Fig. 3
figure 3

In KNN for classification issues, the distance between the blue point and each red point is used to get the average value of the three closest points, which is then used as the predicted value

3.4.2 Support vector machine (SVM)

The support vector machine (SVM) was proposed as a machine learning tool [17, 58, 59]. To address the issue of classification and regression, a nonlinear solution was created [60]. The utilization of statistical learning theory by SVM technology facilitates the simplification of predictions and judgments [59]. The flexibility of SVMs is associated with various kernel functions, including radial basis function, linear, polynomial, and sigmoid, as reported in previous studies [2]. The primary objective of SVM is to identify an effective discriminative hyperplane that accurately classifies data points and separates points belonging to two distinct classes by minimizing the risk of misclassification of both training and test samples [58, 60]. On the other hand, SVM performs badly on noisy datasets [60].

3.4.3 Random forest (RF)

The RF algorithm is widely utilized and involves the aggregation of numerous decision trees generated from a training dataset [61]. Breiman originally proposed this approach, which employs multiple CART decision trees to arrive at a final prediction [54, 58]. However, the RF algorithm assesses a randomly selected portion of the predictors. The classification process is applied to each decision tree, and a voting technique is employed to determine the next class, ultimately resulting in the final prediction derived from the ensemble of trees [59].

Figure 4 presents an illustration of the architecture structure of RF [30]. There are a few benefits of using RF. First, automated processes make tree implementation very simple. Second, it can adjust to new data with little to no additional modeling. Third, RF reduces worries about whether nonlinear effects or higher order interactions for predictors should be considered, as well as the right modeling of relationships between predictors and the outcome. Fourth, RF has excellent predictive power. Despite these benefits, RF does have some drawbacks, one of which is that it is not as easily interpreted as a single tree would be, making it less obvious which variables are most significant [5] (Table 1).

Fig. 4
figure 4

The random forest classifier generates an initial set of samples denoted as tree n. Then, predictions are generated by the utilization of majority voting for classification trees

Table 1 Explanation of parameters contained in formulas

3.4.4 AdaBoost classifier (ADA)

The ADA algorithm is a machine learning technique that falls within the boosting algorithm family [62]. The objective of the ADA algorithm to enhance the accuracy of classification by transforming a set of weak classifiers into a powerful classifier [62,63,64]. The algorithm functions by iteratively training a cohort of learners, ultimately combining their outputs to facilitate prediction [62]. The boosting algorithm utilizes a deliberate subset of the dataset as opposed to utilizing random sampling [65].

To categorize sets in the ADA algorithm, an initial iteration assigns uniform weights to all samples, and later iterations adjust the weights to better align with the data [62]. In repeated iterations, misclassified samples from prior iterations are assigned greater weight than correctly classified samples from the same iterations [62]. The diagram presented in Fig. 5 depicts the sequential process of the ADA algorithm, the combination system, and the weak learning algorithm utilized for the purpose of default prediction, and the steps of the ADA method can be summed up as follows [64], where all samples in the dataset denoted as (Ds) are assigned uniform weights.

  1. 1.

    The determination of whether a sample will be taken or not is contingent upon the weight of the sample.

  2. 2.

    Sampling with replacement is employed to generate a training set from the dataset (Ds) based on weight considerations.

  3. 3.

    The training set is subsequently utilized to train a classified.

  4. 4.

    The objective of conducting a prediction loss evaluation is to assess the efficacy of the trained classified and ascertain an appropriate weight (w1) for the said classified.

Fig. 5
figure 5

AdaBoost Classifier schematic with uniform weights for all dataset data points (Ds). Weight determines whether a sample is taken. Weight-based sampling with replacement creates a training set from Ds. Classifiers are trained using the training set. A prediction loss evaluation evaluates the trained classifier and determines its weight (w1)

3.5 Data envelopment analysis (DEA)

DEA is a nonparametric linear programming model used to measure the efficiency of decision-making units (DMUs) like banks and branches [6]. Essentially, efficiency is the ratio of a DMU’s output to its input. Calculated with equation [66]:

$$ \theta = \frac{{{\text{Output}}}}{{{\text{Input}}}} $$
(6)

The efficient frontier in DEA is linear. This frontier envelopes all DMUs. DEA defines ‘efficient’ DMUs as those on the efficiency frontier and ‘inefficient’ those below it. DEA goes further. Say you have multiple inputs or outputs. You’d need another efficiency measure. It can be difficult to identify the weight of an input or output, especially when there are several. Thus, linear programming solves this. DEA calculates frontiers using the Charnes-Cooper-Rhodes (CCR) and Banker–Charnes-Cooper (BCC) models, where constant returns-to-scale (CRS) is used in the CCR model, and variable returns-to-scale (VRS) is used in the BCC model. CRS has constant returns to scale and a straight line frontier; therefore given a unit of input, output production is always the same. VRS is piecewise linear and has various returns to scale depending on the DMU’s scale. So, used VRS in this study [66, 67]. The BCC model has output and input models [66]. Below is the input model:

$$ \min \theta_{B} $$
(7)

Subject to:

$$ \theta_{B} x_{0} - X\lambda { } \ge 0 $$
$$ Y\lambda \ge { }y_{0} $$
$$ e\lambda = 1 $$
$$ \lambda \ge 0 $$

where

\(\theta_{B} \) is a scalar.

\(X \) and Y are the vectors of the inputs and outputs respectively for all DMUs.

\(x_{0} \) and \(y_{0} \) are the vectors for the DMU being optimized.

\(\lambda \) is a vector of the coefficients for the inputs and outputs.

\(e\) is a row vector with all element’s unity.

The dual form is as follows:

$$ {\text{max}} z = y_{0} - u_{0} $$
(8)

Subject to:

$$ vx_{0} = 1 $$
$$ - v X + uY - u_{0} e \le 0 $$
$$ v \ge 0,u \ge 0, u_{0} \,\,{\text{free in sign}} $$

where X and Y are the vectors of the inputs and outputs, respectively. \(v \), \(u\) and \(\lambda\) are the vectors of variables for the inputs and outputs.

Below is the output model:

$$ {\text{max}} \,\,\eta_{B} $$
(9)

Subject to:

$$ X \lambda \le x_{0} $$
$$ \eta_{B} y_{0} - Y\lambda { } \le 0 $$
$$ e\lambda = 1 $$
$$ \lambda { } \ge 0 $$

where \(\eta_{B}\) is learning rate.

The dual form is as follows:

$$ {\text{min}} z = vx_{0} - v_{0} $$
(10)

Subject to:

$$ uy_{0} = 1 $$
$$ v X - uY - v_{0} e \ge 0 $$
$$ v \ge {0},u \ge {0}, u_{{0}} \,\,{\text{free in sign}} $$

One major advantage of DEA is its capacity to simultaneously process many inputs and outputs, in contrast to regression analysis. Furthermore, DEA allows for the inclusion of inputs and outputs with varying units, enabling their joint analysis. Other benefit of DEA is that it does not necessitate any assumptions regarding the distribution of inefficiency or a particular functional form on the data in order to identify the most efficient DMU [2, 3]. On the other hand, the main disadvantage of DEA is that they ignore random error and presume any departures from the frontier are due to DMU inefficiency [10, 12]. Another limitation of DEA is that adding a new DMU must restart the DEA to get its efficiency score. With the rise of big data, real-world DMU databases are growing rapidly. Since we have computed DEA efficiency for many DMUs, rerunning the model to calculate efficiency for a new DMU would require a lot of memory and CPU time [12].

When working with a limited sample size, the accuracy of DEA decreases and a minimum number of DMUs is required [3, 61]. The general rule of thumb is as follows [6]:

$$ n \ge \max \left\{ {m* s,3\left( {m + s} \right)} \right\} $$
(11)

where number of DMUs (n), number of inputs (m), and number of outputs (s). The methodology suggested for the purpose of evaluating efficiency, as shown in

Figure 6, involves the utilization of the two-stage DEA approach. The utilization of VRS technology is utilized in order to compute the efficiency score of each DMU through the following stages:

  • For stage I, the m different inputs are fixed assets (X) and IT expenditure (I), the output is Deposit (D).

  • For stage II, the only input is a (D) which is also the main output of stage I (N).

Fig. 6
figure 6

Two-stage DEA model for efficiency score

Finally, for the overall stage, the profits made from investing the deposits in stage II and also (%L) are also the two main outputs.

For further information on input-oriented DEA, visit the developer’s website (https://onlineoutput.com/dea-software/) was used to calculate the relative efficiency of DMUs.

3.6 Bagging classifier (BC)

The bagging classifier is also known as bootstrap aggregating [68, 69]. Bagging is a machine learning approach that generates random subsets from the original training dataset. Randomly generated training sets facilitate the aggregation of a collection of “weak learners” to form a “strong learner” [68]. This is achieved by utilizing randomly generated training sets. Bagging is a simple method that has been observed to effectively decrease variance and enhance the accuracy of unstable classifiers. Additionally, the utilization of random features can contribute to improved accuracy and reduce the risk of overfitting. However, one drawback of the bagging technique is that the interpretability of individual trees is compromised when they are combined into a bagged ensemble [70]. The steps of the BC approach can be summarized as depicted in Fig. 7 [31, 70, 71].

  1. 1.

    Data samples in training data sets are colored.

  2. 2.

    Samples are selected at random with replacement from the original dataset.

  3. 3.

    To train different classifiers, since sampling with replacement issued, some of the original samples.

  4. 4.

    To classify an unknown sample, each classifier makes a prediction, which counts as one vote. Then the classification result is determined according to the majority rule.

Fig. 7
figure 7

Structure of bagging classifier

3.7 Inputs combinations

As stated in Table 2, this study examined four different combinations of data as inputs for the suggested model.

Table 2 Different inputs combinations used in this study

3.8 Model performance evaluation

In order to evaluate the efficacy of the models, four metrics are utilized for comparative purposes [2]. These metrics include accuracy (ACC) [2], precision or positive predictive value (PPV), sensitivity, recall, hit rate, or true positive rate (TPR) [68], and F1 score [56].

$$ {\text{ACC}}\, \left( {{\text{accuracy}}} \right) = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} $$
(12)
$$ {\text{PPV}}\, \left( {{\text{precision}}} \right) = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}} $$
(13)
$$ {\text{TPR}}\, \left( {{\text{recall}}} \right) = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} $$
(14)
$$ {\text{F}}1 = \frac{{2{\text{ *PPV * TPR}}}}{{\left( {{\text{PPV }} + {\text{ TPR}}} \right)}} $$
(15)

where TP is the True positive. TN is the True negative. FP is the False positive. FN is the False negative. PPV is the precision or positive predictive value. TPR is the sensitivity, recall, hit rate, or true positive rate.

4 Experimental results and discussion

This part is based on the experimental results obtained from the proposed model, with the objective of assessing its efficiency in utilizing different financial data as inputs. Subsequently, a comprehensive comparative study is presented, highlighting the different outcomes obtained from our proposed model in comparison to the base learners. The current study employed four distinct statistical metrics, namely accuracy is computed by summing the count of correct predictions, where the maximum achievable accuracy is 1.0 and the minimum is 0.00, precision is determined by the count of accurate positive predictions, with the optimal accuracy being 1.0 and the worst being 0.0, recall is calculated based on the number of accurate positive predictions, where the highest recall is 1.0 and the lowest is 0.0, and F1 score is a metric that assesses the accuracy of a test, taking into account both precision and recall [72]. In conjunction with a range of input financial variables, the research objectives are assessed.

Experiment 1 Performance analysis of super learner (SL).

Aim The aim of the study compared the performance of the SL model with that of the DEA model during testing period, using various combinations of financial data.

Discussions and observations The study’s results are presented in Table 3, indicating that the model incorporating all financial variables (M1) exhibited the highest performance in terms of PPV, TPR, and F1 (0.9444, 1.000, and 0.9714, respectively) across all input combination. When excluding the percentage of performing loans from the bank’s variable in (M2), performance was lower in the statistical measures (0.8947 and 0.9444 for both PPV and F1), with the exception of TPR (1.000), which remained unchanged. It can be seen in (M3) when compared to M1 and M2 that when profit accrued from investing in securities are excluded, the performance of PPV, TPR and F1 statistical measures is relatively lower (0.8889, 0.9412, and 0.9143). Moreover, under decreasing input variables, the M4 uses the input combinations of total IT expenditures, total fixed assets, and total deposits, the former model yields PPV, TPR, and F1 for each of 0.9412, 0.9412, and 0.9412. M4 is better than M3 on all statistical measures when the number of inputs is reduced. It was observed that the variable percentage of performing loans and the variable returns obtained from investment deposits do not significantly affect the results. It was observed that the variable percentage of performing loans and the variable returns obtained from investment deposits do not significantly affect the results.

Table 3 Testing set results of proposed model and base learners

A diagrammatic representation of all the classifiers used on a given dataset is shown in Fig. 8. The most optimal SL was executed in the M1 inputs, exhibiting a high accuracy (ACC = 0.9545). Furthermore, the study found removing L% of M1 inputs resulted in a decrease of 4.99% in ACC values in M2. Specifically, the ACC values were 0.9545 and 0.9091 for M1 and M2, respectively. Furthermore, it can be seen that there are similarities between the ACC values obtained for SL when utilizing M2 (total IT expenditures, total fixed assets, total deposits and profit accrued from investing in securities) and M4 (total IT expenditures, total fixed assets, and total deposits) inputs, with ACC values of 0.9091 and 0.9091, respectively, although the number of inputs are different. Additionally, the M3’s accuracy (ACC = 0.8636) is the lowest of the other models because it was lower in the M1 by 10.5% and in the M2 and M4 by 5.26%.

Experiment 2 Performance comparison of proposed model with base learners.

Aim Comparison of performance analysis of SL and base learners across input combinations.

Discussions and observations The performance of the base learners differed based on the input combinations, as shown in Table 3. When all financial variables are included in the M1 inputs, it is seen that RF performs better in PPV, TPR, and F1 measures, achieving scores of 0.8889, 0.9412, and 0.9143, respectively. Forward by, ADA’s performance is lower than that of RF where the results show that the values of PPV, TPR, and F1 (0.8824, 0.8824, and 0.8824, respectively). Furthermore, the SVM exhibits comparatively lower performance than RF and ADA in terms of PPV and F1, with values of 0.7619 and 0.8421, but SVM demonstrates a similar performance to RF in terms of TPR, with a value of 0.9412, and while ADA lower performance than ADA in terms of TPR, with the values of 0.8824. Finally, KNN has the lowest performance across all input combinations for PPV, TPR, and F1 (0.8333, 0.5882, and 0.6897, respectively).

The empirical findings show that the performance of all algorithms in M2 inputs remained consistent with that of M1 inputs, with the exception of the RF algorithm, subsequent to a reduction in the number of input combinations utilized. The RF’s performance exhibited superiority when utilizing M1 inputs in contrast to M2 inputs, as shown by the lower values of PPV, TPR, and F1 were 0.8750, 0.8235, and 0.8485, respectively, for M2 inputs.

When replacing the variable (P) in M2 inputs with the variable (%L) which makes M3 inputs combination. The utilization of M3 inputs in conjunction with the KNN and RF algorithms resulted in increased performance compared to M2 inputs. The KNN algorithm has shown notable improvement in its performance metrics, specifically its PPV, TPR, and F1, which are 0.8462, 0.6471, and 0.7333, respectively. Additionally, it was observed that the RF performance exhibited superior outcomes at the M3 inputs in contrast to the M2 inputs with respect to the values of PPV, TPR, and F1, which were 0.8824, 0.8824, and 0.8824, respectively. But the aforementioned performance was inferior to that of M1 inputs in terms of PPV, TPR, and F1.

On the other hand, the SVM algorithm’s performance yields M3 inputs that are similar to M2 inputs and M1 inputs, as shown by the PPV, TPR, and F1 metrics of 0.7619, 0.9412, and 0.8421, respectively. Additionally, the ADA algorithm demonstrated poor performance in terms of PPV and F1, with values of 0.8333 and 0.8571, respectively, but TPR continued to be similar to that of the M2 inputs, with a score of 0.8824.

When reducing the number of M1 inputs to produce M4 inputs, which included variables such as X, I, and D, the result shows the M4 inputs related to PPV, TPR, and F1 were similar to those of the M1 inputs for the RF, SVM, and ADA algorithms. Specifically, the RF algorithm yielded results of 0.8889, 0.9412, and 0.9143 for PPV, TPR, and F1, respectively. Similarly, the SVM algorithm produced results of 0.7619, 0.9412, and 0.8421 for PPV, TPR, and F1, respectively. Lastly, the ADA algorithm resulted in values of 0.8824, 0.8824, and 0.8824 for PPV, TPR, and F1, respectively. However, it was observed that the RF value at M2 inputs was lower than that at M4 inputs, while SVM and ADA yielded the same outcomes as those obtained at M4 inputs. Similarly, the results obtained at M3 inputs were similar to those obtained by SVM at M4 inputs, but with lower RF and ADA values except the TPR value at M3 inputs was equivalent to that obtained at M4 inputs. Regarding KNN, it exhibits superior values than the M1 inputs in terms of PPV, TPR, and F1 (0.7333, 0.8462, and 0.6471, respectively). But the outcome observed in the M2 inputs exhibits a lower value in contrast to that of the M4 inputs, whereas the M3 inputs demonstrate the same outcome to that of the M4 inputs.

On the other hand, the results shown in Fig. 8 present a graphical representation of the accuracy of all classifiers. The ACC of the KNN algorithms was observed in the M3 inputs and M4 inputs showing a similarity score of 0.6364, despite using different input variables. Additionally, the KNN accuracy is lower than the ACCs of both in the M1 inputs and the M2 inputs, with score of 0.5909. The RF value remains constant at 0.8636 for both M1 and M4 inputs, while it decreases at both M2 and M3 inputs measuring 0.7727 and 0.8182, respectively. On the other hand, the accuracy of the SVM remains consistent across all combinations. Furthermore, the accuracy of the ADA model remains consistent with a result of 0.8182, when using a combination of M1, M2, and M4 inputs. The accuracy of M3 inputs shows the lowest value across all input combinations, with an accuracy score of 0.7727.

Fig. 8
figure 8

Bar plots showing the results for base learners and super learners in terms of accuracy in M1(I, X, D, P,%L), M2 (I, X, D, P), M3(I, X, D,%L), and M4(I, X, D)

Finally, the performance of the SL models exhibited superior across all input combinations, in contrast to the results of the base learner models. Evaluation measures including ACC, PPV, TPR, and F1 ranged from 0.8636, 0.8889, 0.9412, and 0.9143 to 0.9545, 0.9444, 1.000, and 0.9714, respectively. SL models exhibit superior performance in the evaluation of bank efficiency due to their comparatively higher and lower values for ACC, PPV, TPR, and F1, as compared to other models. Moreover, it demonstrates the ability to produce accurate outcomes with limited financial data, namely M2, M3, and M4.

Experiment 3: Comparison proposed model with traditional DEA model.

Our suggested model was put to the test using performance metrics, and its efficiency was assessed by comparing its results with the traditional DEA model. The objective of the aforementioned research [73] was to determine the influencing factors for technical efficiency, resource analysis and business efficiency in order to examine the efficiency of the Vietnamese banking system from 2014 to 2017 and the study’s sample of banks consisted of 14 banks. The dataset utilized was obtained from the Kaggle website and was collected from the annual reports and audited financial statements of banks [74]. With respect to CRS, an increase in inputs results in a rise in outputs that is proportionate, but under VRS, an increase in inputs does not cause an increase in outputs that is proportionate where the comparison has been evaluated based on one metric, which is accuracy (ACC) [75]. Table 4 shows that SL superior to DEA in terms of ACC, with a DEA score of 0.7905 and an SL score of 0.8333. With regard to the other statistical metrics included in the study, PPV and F1 in terms of SL findings (0.8333 and 0.9091, respectively) outperform DEA results (0.4772 and 0.6461, respectively) with the exception of TPR in both cases, which give the same result (1,000).

Table 4 Result of proposed model and traditional DEA

Experiment 4: Comparison between Proposed model and Bagging model.

Aim: Performance indicators were used on the testing set to assess the efficiency of our proposed model, and the outcomes were compared to those of alternative techniques like the bagging classifier (BC). Using four different input combinations (I, X, D, and %L), (I, X, D, and P), (I, X, D, and %L), and (I, X, and D), the BC model and our proposed model have been compared. To compare and assess a model’s accuracy and goodness of fit, data analysis and modeling commonly use the statistical metrics ACC, PPV, TPR, and F1. For all input combinations used in the comparison, the results of the proposed model and the BC model with the base estimator difference in BC are presented in Table 5 for the four metrics.

Table 5 Result of super learners and bagging models with the base estimator difference for all input combinations, expressed in terms of four metrics

Discussions and observations: The results showed that the ACC, PPV, TPR and F1 values of our proposed model outperformed those of the BC models with the difference of the basic estimator for all input combinations, where the five basic estimators of the BC model are (DT, ADA, RF, KNN and SVC). Our suggested model values for ACC, PPV, TPR, and F1 are 0.9545, 0.9444, 1.000, and 0.9714, respectively. The results of RF as a basic estimator indicated better results than those from other basic estimators (0.9090, 0.8947, 1.000, and 0.9444, respectively), but not better than those generated by the proposed model. Next, results from DT (0.8186, 0.8888, 0.9411, and 0.8888), ADA (0.8181, 0.8421, 0.9411, 0.8888), KNN (0.8333, 0.8823, 0.8571), and SVC (0.7720, 0.7720, 1.000 and 0.8717, respectively) when using M1 with inputs I, X, D, %L, and P.

The ACC, PPV, TPR, and F1 values of our suggested model beat those of the BC models when M2 inputs, which consist of I, X, D, and P inputs, were used. Our proposed model values for the ACC, PPV, TPR, and F1 are 0.9091, 0.8947, 1.000, and 0.9444, respectively, and when using the other basic estimators RF, DT, RF, KNN, and SVC, the results showed that utilizing ADA as a Base estimator was superior to the proposed model in terms of PPV (0.9411) but inferior to the proposed model in terms of ACC, TPR, and F1 (0.9090, 0.9411, and 0.9411, respectively). Additionally, the outcomes were worse to the proposed model when using the other fundamental estimators DT, RF, KNN, and SVC, whereas using the DT (0.8636, 0.9375, 0.8823, and 0.9090, respectively), RF (0.8636, 0.8888, 0.9411, and 0.9142, respectively), KNN (0.8181, 0.8823, and 0.8823, respectively), and SVC (0.7727, 0.7727, 1.000 and 0.8717, respectively).

Our proposed model performed better than the BC models when the variable (P) in M2 inputs was replaced with the variable (%L), which made M3 inputs consisting of I, X, D, and %L inputs. Our proposed model values for the ACC, PPV, TPR, and F1 are 0.8636, 0.8889, 0.9412, and 0.9143, respectively and when using the other basic estimators RF, DT, RF, KNN, and SVC, the results were lower for the proposed model than when using RF (0.8181, 0.8823, 0.8823, and 0.8823, respectively), DT (0.7727, 0.875, 0.8235, and 0.8484, respectively), ADA (0.7727, 0.8333, 0.8823, and 0.8571, respectively), KNN (0.6818, 0.8125, 0.7647, and 0.7878, respectively), and SVC performed better in TPR (1.000) than the proposed model, but performed less well in ACC, PPV, and F1 (0.7727, 0.7727, and 0.8717 respectively).When reducing the number of M1 inputs to produce M4 inputs, which included variables such as X, I, and D, the result show the M4 inputs related to ACC, PPV, TPR, and F1 in our proposed model are 0.9091, 0.9412, 0.9412, and 0.9412, respectively and when using the other basic estimators RF, DT, RF, KNN, and SVC, the results were lower for the proposed model than when using RF (0.8636, 0.8888, 0.9411, and 0.9142, respectively), DT (0.7727, 0.9285, 0.7647, and 0.8387, respectively), ADA (0.8181, 0.8823, 0.8823, and 0.8823, respectively), KNN (0.8181, 0.8421, 0.9411, and 0.8888, respectively), and SVC performed better in TPR (1.000) than the proposed model, but performed less well in ACC, PPV, and F1 (0.7727, 0.7727, and 0.8717 respectively). In comparison to the BC models, our suggested model performed better overall across all input combinations included in the study. The current results suggest that the proposed model shows a high degree of accuracy and may be used successfully for measuring the efficiency of banks.

5 Conclusions and future work

The data envelope analysis (DEA) method has been used for assessing the efficiency of decision-making units (DMU) to optimize efficiency when adding new DMUs to see how efficient they are in DEA and this will require a lot of memory and CPU time. Academic researchers have commenced investigations into the amalgamation of intelligent algorithms with the DEA approach to augment prognostic precision, given the wide use of machine learning technology. This study was conducted to provide a high-performance ensemble learning model that has been proposed for assessing the efficacy of banks using limited financial data. The ensemble method which is called super learner technique is based on the cross-validation theory and includes four base learner models KNN, SVM, RF, and ADA. But we still need to enhance the outcomes through other means:

  • Additional variables that may affect the efficiency and performance of banks, such as liquidity ratio, could be included as predictive factors in forthcoming research.

  • To improve the accuracy of the model, research is being done to investigate alternate base learner models or to scale up the number of these models.

  • It is possible to compute the efficiency using a variety of statistical methods in addition to utilizing other models that are distinct from those presented in this study.