Introduction

Banks play a critical role in financial intermediation, serving as a channel for allocating capital in the economy and reducing the cost of monitoring borrowers. According to Freixas and Rochet [12], banks’ primary function is to grant loans to borrowers and receive deposits from savers. As a result, a sound financial system is fundamental to sustainable economic growth, with empirical and theoretical evidence supporting a positive correlation between financial development and economic growth [14, 20].

The modern theory of financial intermediation is based on market failures resulting from the presence of asymmetric and/or imperfect information among agents such as borrowers and savers. Banks are more efficient in managing such failures through monitoring, according to Paula [24]. However, financial intermediation involves inherent costs and risks, such as credit risk, liquidity risk, and market risk. Banks should remunerate such risks adequately, with the bank spread—the difference between the rates for capturing deposits from savers and borrowers’ loan remuneration—compensating for such risks. The balance of these spreads’ impacts credit supply and economic growth dynamics, as noted by Stiglitz and Weiss [32].

Given the importance of banks in financial intermediation, their profitability is a topic of discussion in the literature. Demirgüç-Kunt and Huizinga [10], Dick [11], and Jiang et al. [16] present empirical evidence that economic growth and high interest rates increase banks’ profitability, impacting the volume and profitability of credit operations. Athanasoglou et al. [4] conclude that bank expenditure and management efficiency are significant determinants of profitability. García-Herrero et al. [13] suggest caution in analyzing bank profitability, as it may be associated with market concentration, while Maudos and Solís [22] point out that the banking sector should be efficient and competitive to promote savings and investment.

The literature on bank profitability determinants is a comprehensive and well-studied field, covering several aspects inherent to banking activity. Typically, researchers categorize these determinants into three categories. The first category pertains to variables applicable to the banks’ operational activity, such as profitability margins, the size of the operation, growth in financial intermediation revenues, default indicators, operating income-to-total assets, ratio of defaulted loans to total loans, and quality management [31]. The second category of studies in this field concentrates on bank size and market dynamics, aiming to assess the degree of market concentration and competitiveness [15]. Finally, the third and last category addresses macroeconomic variables, such as interest rates, foreign exchange rates, the growth of gross domestic product and inflation [9, 27]. Also, some studies utilize a mix of macro and internal factors [1, 2, 26]. By considering these three categories of variables, researchers can gain a better understanding of the factors that influence banking activity and ultimately improve the banking industry’s performance.

In the literature, there is some consensus that the conditional distribution of bank profitability does not have a standard shape. Oliveira et al. [23] argued that excluding outliers in statistical analysis may not be the best approach, as it overlooks the heterogeneity of the conditional distribution of variables. In this context, quantile regression provides an ideal tool to examine the impact of determinants at different levels of profitability, considering this heterogeneity. In summary, quantile regression can provide nuanced insights into the determinants of bank performance at different percentiles of the distribution. Several studies, such as Chowdhury et al. [6] and Covas et al. [7], used quantile regression models to assess the determinants of bank profitability. Li [21] used quantile regression to estimate the risk-return ratio of American listed banks. The results showed differences in how banks with high profitability and low profitability respond to credit risk, providing insights into risk-return dynamics. Koutsomanoli-Filippaki and Mamatzakis [19] evaluated the European Union banking sector’s efficiency, risk, and return using quantile regression. They found that quantile regression models provided different results compared to traditional ordinary least squares (OLS) regression models, especially in the tails of the distribution.

In this study, we investigate bank profitability using the quantile regression approach, which allows assessing the influence of factors among different percentiles of the sample, identifying behaviors related to the determinants for low, medium, or high profitability banks. We used a database from the Orbis database, maintained by Bureau Van Dijk, with data from 1200 institutions with the highest market value in 101 countries. Our research design involved estimating models via quantile regression with panel data, using fixed effect for bank and time effect. To our knowledge, this study is the most comprehensive in terms of the number of banks and countries involved, allowing for a more comprehensive understanding of the determinants of bank profitability. Additionally, no other study has involved the use of dynamic panel quantile regression while simultaneously addressing sample survival bias. Another important contribution of the study is the finding that inflation has a positive impact on bank profitability. This effect has remained controversial among previous studies that consider only one or a few countries in their samples.

The results obtained through quantile regression analysis corroborate existing literature. Bank size and capital adequacy had a negative impact, while market value had a positive impact on higher profitability banks. Credit risk had a negative impact on lower profitability banks and a positive impact on higher profitability banks. The inflation rate influenced only the higher profitability banks. Overall, this study offers valuable insights into the factors determining bank profitability and how they behave at different percentiles in the sample, suggesting the importance of bank efficiency and competition in promoting economic growth. Further research is needed to isolate specific metrics for shareholder maximization.

Methods

The present work seeks to evaluate how the determinants of bank profitability behave among the different percentiles of the sample, in order to identify how banks at different levels of profitability respond to the determinants defined by the literature, from 2011 to 2018, using the quantile regression methodology.

Database

The database used in this work was extracted from the Orbis database maintained by Bureau Van Dijk, a subsidiary of Moody’s Analytics. 1200 institutions with greater market capitalization, distributed in 101 countries, except for China, are used to obtain the economic and financial indicators, from the annual financial statements between 2011 and 2018.

It was decided not to exclude institutions that did not present all the observations in order to avoid selection bias. Nor was any adjustment made in relation to outliers, according to Koutsomanoli-Filippaki and Mamatzakis [19] and Oliveira et al. [23], as quantile regression proves to be robust by concentrating the evaluation on certain percentiles and not only on the average of the observations.

Dependent variable

According to the existing literature contained in this work, the indicators used as representatives of bank profitability in the existing literature basically consist of the return on assets (ROA) and return on equity (ROE), as well as their variants. ROA measures the company’s ability to generate results, given a certain level of assets before the effects of debt financing, while ROE measures the profitability attributed to the shareholder, that is, net profit after taxes and expenses financial [8, 30].

Independent variables

As the literature points out and aiming at a better organization of the study, the independent variables will be divided into two groups: (i) specific variables, in an attempt to bring operational parameters for each one of the assessed banks; and (ii) macroeconomic variables, seeking to control and evaluate the different macroeconomic scenarios in which each country is inserted.

Bank Size (tam): As suggested by Petria et al. [26], and Tabak et al. [34], it consists of the logarithm of Total Assets in US dollars, having an ambiguous expected relationship in bank profitability, whether motivated by the hypothesis that smaller banks use niche strategies to seek greater profitability, or larger banks use their size to strengthen their capabilities in search of greater profitability.

Capital Adequacy (cap): As suggested by Albulescu [2], Petria et al. [26], Sayani et al. [31] and Tabak et al. [34], consists of the ratio of Shareholders’ Equity and Total Assets in percentage, having an ambiguous expected relationship motivated by the hypothesis that banks with greater capital adequacy would be able to offer lower returns on their deposits because they are safer and, consequently, more competitive, or even by the hypothesis that banks with greater capital adequacy would be under-leveraged, not reaching the maximum expected profitability.

Credit Risk (crisco): As suggested by Albulescu [2], Petria et al. [26], Sayani et al. [31] and Tabak et al. [34], consists of the ratio between the item of Loans in Default and Total Loans in percentage, having a negative expected relationship with profitability for linear reasons—the greater the amount of loans in default, the greater the expected losses and the lower the result of the bank analyzed.

Management Efficiency (efici): As suggested by Albulescu [2], Petria et al. [26] and Primo et al. [28], consists of the ratio between Total Costs and Total Revenues as a percentage, having a negative expected relationship with profitability. That means banks with higher costs to generate the same monetary unit of revenue tend to obtain lower profitability compared to banks with better cost management efficiency.

Liquidity Risk (lrisco): As suggested by Petria et al. [26] and Sayani et al. [31], it consists of the ratio between Loans and Deposits as a percentage, having a positive expected relationship with profitability, reflecting the banks’ ability to expand the credit supply based on existing deposits. It is a risk factor for banks and highly regulated by central banks in the world.

Market Value (mcap): As suggested by Demirguc-Kunt and Huizinga [9], it consists of the Neperian logarithm of the market value of banks in thousands of US dollars, in line with the size of the bank as measured by total assets, and an ambiguous relationship is expected for this variable.

Inflation (infl): As suggested by Demirguc-Kunt and Huizinga [9], Petria et al. [26], Primo et al. [28], and Tabak et al. [34], it consists of the annual percentage change in the consumer price index for each country. According to Alhadeff [3] Inflation can adversely affect banks’ profitability by increasing the cost of funds and reducing the real value of assets, leading to lower net interest margins and profitability. Conversely, in China inflation has shown a positive relationship with bank profitability [36]. In Bangladesh, Sri Lanka, and Pakistan, inflation was found to have no significant impact on bank profitability [33]. Therefore, the effect of inflation on bank profitability is not settled in the existing literature, with controversy still prevailing. We expect an ambiguous relationship for the variable, given that it is closely linked to the ability of banks to operate at different levels of inflation and how consumers react to price changes.

Economic Growth (cresc): As suggested by Demirguc-Kunt and Huizinga [9], Petria et al. [26], and Tabak et al. [34], it consists of the annual variation in percentage of the gross domestic product per capita for each country, expecting a positive relationship for the variable and considering the perspective of increased demand for investment and credit.

Basic Interest Rate (i): As suggested by Demirguc-Kunt and Huizinga [9], Primo et al. [28], and Tabak et al. [34], consists of the basic interest rate of the economy, being one of the main instruments of monetary policy, so that financial agents mark their operations at this rate. An ambiguous result is expected for this variable since, at high levels, borrowers would have a greater debt service and, eventually, greater difficulties in remaining in default. At the same time, at high levels, the banks’ ability to increase the bank spread is enhanced by considering that borrowers evaluate the total cost of the operation and not just the applicable spread.

Table 1 provides a summary of all selected variables, with measurement unit, proxy, and expected ratio:

Table 1 Description of variables

Econometric model

Quantile regression is an econometric technique developed by Koenker [18] and expanded to longitudinal data as a result of the study by Koenker [17]. It will be applied to this study because it is more robust to outliers and appropriate, given the treatment of heterogeneity when the conditional distribution of the dependent variable is not homogeneous. Therefore, the estimated coefficients of the parameters may assume different values between the percentiles.

According to Koenker [18], the quantile regression method is an extension of the classic linear regression model, given that the OLS estimator focuses on only one central tendency measurement, while the quantile regression evaluates the distribution of the dependent variable, conditional to the set of explanatory variables for each percentile.

The estimation of the model consists of the following logic, either (yi, xi), i = 1,2, …, n sample of the population of n individuals, where xi is the vector of independent variables and yi is the dependent variable. Assuming that the θ-th percentile of the conditional distribution of yi is linear in xi, the conditional percentile can be represented in the regression model as follows:

$$y_{i} = x_{i}^{\prime } \beta_{\theta } + \varepsilon_{i\theta }$$

\(Quan{t}_{yi}\left(\theta |{x}_{i}\right)\equiv \text{inf}\left[y :{F}_{i}\left(y|x\right)\ge \theta \right]={x}_{i}{\prime}{\beta }_{\theta }\) and

$$Quan{t}_{ui}\left(\theta |{x}_{i}\right)=0$$

Where \(Quan{t}_{yi}\left(\theta |{x}_{i}\right)\) denotes o \(\theta\)-th conditional percentile of \({y}_{i}\), conditional on the regressor \({x}_{i}\), \({\beta }_{\theta }\) is the unknown vector of parameters to be estimated for the different values of \(\theta\) in (0,1), \({\varepsilon }_{i\theta }\) is the error term and \({F}_{i}\left(y|x\right)\) is the cumulative distribution function, conditional on \(x\). The estimator of \({\beta }_{\theta }\) is obtained after solving the following condition:

$$\underset{{\beta }_{\theta }}{\text{min}}\sum_{i=1}^{n}{\rho }_{\theta }\left({y}_{i}-{x}_{i}{\prime}{\beta }_{\theta }\right)$$

where \({\rho }_{\theta }\) is the loss function defined as:

$$\left\{\begin{array}{c}\theta u, se \varepsilon \ge 0\\ \left(\theta -1\right)u, se \varepsilon <0.\end{array}\right.$$

Addressing the panel data issue, an extension in the original approach was introduced in the literature by Koenker [17]: the quantile regression model with fixed effects for panel data, aiming to capture the effect of α as the source of a variability attributed specifically to the individual or an unobserved heterogeneity.

The estimation of the covariance matrix of the vector of regression parameters, given the percentiles, will be performed by means of bootstrap replications, according to Buchinsky [5] and Koenker [17]. As we are using dynamic panel, estimation takes place using two-stage least squares (2SLS), according to Wooldridge [37]. The models estimated by OLS and dynamic panel data with fixed effect [29] will be presented to compare the results with QR.

Results and discussion

Table 2 shows the descriptive statistics for the selected variables, considering the database used in econometric modeling. The bank and year variables will be used as indexes for fixed individual and time effects, totaling 1,200 banks between the periods from 2011 to 2018.

Table 2 Summary of descriptive statistics

Table 3 shows the data corresponding to the selected variables in multiple percentiles, noting that the amplitude of the observations for the ROE variable is much greater than that reported by the mean and the confidence interval, including considering only the first and third quartiles, or that is, the interquartile range. For all purposes, the average of the dependent variable ROE is 12.7% with the first quartile at 8.25% and the third quartile at 18.26%.

Table 3 Percentiles for the selected variables

Table 4 shows the correlation data between the selected variables, with weak correlation indicators for almost all variables, except for the variables Bank Size—Market Value and Basic Interest Rate—Inflation for natural reasons.

Table 4 Pearson correlation for selected variables

It is worth highlighting the relation between bank size, capital adequacy, and liquidity risk, indicating that larger banks are less capitalized. That is, an expansion of total assets not accompanied by the banks’ net worth, as well as, for the variables ROE and management efficiency, suggesting that efficient cost management brings results in terms of ROE, a factor corroborated by the studies by Athanasoglou et al. [4]. Another important assessment is the change in the signal between the two periods, notably due to the relationship between economic growth, inflation, and the basic interest rate, suggesting that economic growth stops following inflation and interest rates, and starts to relate in a different way, inverse to these two variables.

Table 5 presents the result of the model estimated via quantile regression (QR) in 2SLS with the lagged dependent variable as an instrument for panel data with fixed effects penalized according to Koenker [17]. For comparative purposes, regression models are also presented in ordinary least squares (OLS) and panel with fixed effects (FE).

Table 5 Results of quantile regression models (2SLS), OLS Regression and FE regression

Starting the evaluation of the models by the lagged profitability, statistical significance is noted for a naturally positive parameter for all percentiles and models (QR, OLS, and FE). Comparing the results between the percentiles, they differ in the extreme percentiles (10% × 90%) and in relation to the average and 90%, suggesting that the banks with the highest profitability tend to present less persistence.

For bank size, the parameters obtained are statistically significant and always negative in all percentiles and models, and their sensitivity differs in the extreme percentiles (10% × 90%) and in the average and 90%. It is important to highlight the impact of size in the 90% percentile, indicating that the size of the bank affects more the banks with high profitability, and when we compare the amplitude of the distribution of the observations of the Total Assets of Level 5 banks (Percentile 90%) are consistently lower than other banks at the other Levels. Both results indicate that investment or specialized banks (niches), smaller in terms of assets compared to commercial banks, tend to have higher profitability than other banks. Such results corroborate the findings of Henriques et al. [15], Tabak et al. [34, 35].

Capital adequacy is statistically relevant with a negative sign in all percentiles and models except for the EF model, despite showing the same behavior for all analyzed percentiles. That is, the impact is identical on profitability, regardless of the level of profitability of the bank. It is worth mentioning that this is a metric that has been growing over time for all levels of profitability, so that there is a higher portion of Shareholders’ Equity in relation to Total Assets in the banks analyzed, indicating reflections of the tightening of regulatory policies such as response to the 2008 World Financial Crisis and Basel 3—results consistent with Pennacchi and Santos [25].

Credit risk is statistically significant in all tail percentiles (10%, 25%, 75%, and 90%). However, it is worth noting the sign change, where banks with low profitability have a negative relationship with Credit Risk, while banks with high profitability show a positive relationship, indicating a differentiation in the risk and return metric. Li [21], who also applied the Quantile Regression methodology, found the same relationship, while Petria et al. [26] and Sayani et al. [31], who use conditional mean in their studies, only evaluated the negative factor of the metric in question.

Considering management efficiency, the results are statistically significant and always negative in all percentiles and models, despite being statistically equal in all percentiles. However, when evaluating the distribution of this metric together with the results, efforts are noted in this metric on the part of banks through the reduction of operating costs, in line with the digitization of banking activities, reduction in the number of branches and greater cost management. At all levels of profitability, the increase in efficiency has occurred and has not been sufficient to bring new increases in profitability, only in its maintenance; the results being consistent with Petria et al. [26].

Evaluating the liquidity risk metric, the significance of the parameters obtained in the models is mixed, being statistically significant and positive only for the 50% and 75% percentiles. It is possible that part of the variability of this parameter was captured by the capital adequacy metric, with loans being a proxy for Total Assets, and deposits the inverse of Shareholders’ Equity. Petria et al. [26] indicate this variable as statistically significant in their studies, although the results through quantile regression only indicate for two percentiles.

The results for market value are statistically significant and always positive in all percentiles, and although they only differ in the extreme percentiles (10% × 90%) and demonstrate a positive increase in sensitivity as a function of the percentiles, the results are consistent with Pennacchi and Santos [25].

It is noted that inflation indexes decrease more sharply in the higher percentiles. Bear in mind that in these percentiles, there was a tendency to reduce profitability, that is, given the positive linear relationship of this parameter, the reduction in inflation is also implying a reduction in profitability and, more sharply, in the highest percentiles. Primo et al. [28] suggest a negative and weak relationship, whereas Demirguc-Kunt and Huizinga [9] and Tan and Floros 36 identify a positive relationship, endorsing the results of the presented model. We believe that our result is more robust because it involves a larger number of banks and countries than previous studies, and we consider this to be an important contribution of our work.

For the two other economic variables, economic growth and interest rate, the results do not indicate significant parameters in no more than two percentiles. In the case of the interest rate, it is possible that part of the variability was captured by the Inflation metric, given the high correlation between the variables. It is necessary to emphasize that the literature brings mixed results for this variable and that in several studies the database contemplates only one country or a group with similar characteristics, an aspect that is different from the database of this work.

The set of graphs presented in Fig. 1 demonstrates the variability of the coefficients as a function of the percentiles in comparison with the conditional average models (OLS and FE).

Fig. 1
figure 1

Behavior of Estimated Parameters for Selected Variables as a function of Percentiles

Among the main results, it is worth highlighting the contribution of the quantile regression model, especially for bank size and credit risk. With regard to the evaluation of bank size, there is an increase in sensitivity over the percentiles in terms of loss in profitability, given the expansion of banks’ Total Assets, whether due to the increase in the number of branches or the expansion of the portfolio credit—thus, the penalty for profitability to increase a unit in Total Assets is greater in banks with high profitability. While for credit risk, the return risk logic is different between banks with high profitability compared to banks with low profitability. That is, for banks with high profitability there is an incentive to increase credit risk in the search for higher returns, countering the disincentive of banks with reduced profitability in this same increase in credit risk, given the expectation of a reduction in profitability.

In addition, coefficient equality tests were conducted considering the standard errors obtained via 1,000 bootstrap interactions. The main comparisons can be seen in Table 6.

Table 6 Equality Test of estimated coefficients between percentiles of QR

Conclusions

The banking industry plays a crucial role in economic development by facilitating financial intermediation from savers to investors and ensuring liquidity in the economy. The discussion on bank profitability is essential as excessive profits can limit economic growth, while low profits can hinder market development.

The studies found in the literature have limitations regarding sample size, both in terms of the number of banks and the number of countries involved. Additionally, many previous works did not take into account the difference in distribution among the different percentiles of bank profitability. Our study reaffirms the determinants of bank profitability using Quantile Regression, enhancing understanding, and complementing existing models with insights into at least three profitability determinants. We analyze data from 1200 institutions across 101 countries to develop a two-stage Quantile Regression econometric model, using the lagged dependent variable as an instrument for panel data with penalized fixed effects to examine the determinants of bank profitability across five percentiles.

Six specific banking factors and three macroeconomic factors were selected from existing literature. Among the specific factors, five showed statistically significant results, with bank size and credit risk standing out. For bank size, higher profitability banks faced more pressure from increased Total Assets compared to their less profitable counterparts. Regarding credit risk, high profitability banks showed different risk-return dynamics, with incentives to expand higher-risk loan portfolios, while low profitability banks did the opposite. Among macroeconomic factors, only the Inflation rate exhibited significant differences across percentiles, indicating that high profitability banks are more responsive to inflation changes than low profitability banks.

Our results suggest that niche or specialized banks tend to have higher profitability than other banks. So, policymakers should encourage market competition to foster efficiency and innovation. Policies that reduce barriers to entry and encourage new entrants can also lead to better services and lower costs for consumers. Additionally, regulatory frameworks should emphasize the importance of robust credit risk assessment and adopt stricter capital adequacy requirements for bigger banks to maintain a balance between growth and risk. Lastly, both policymakers and managers should be vigilant in enhancing risk control for highly profitable banks, as our results suggest that there is a natural incentive for these banks to increase the credit risk in their portfolios.

We believe that the main limitation of this study is the lack of a comparative analysis between pre and post-pandemic periods. How might remote work and increased digitization of banking services have altered the impact of determinants across different levels of bank profitability? As a suggestion for further research, the progress in understanding the significant metrics of this work, bank size, capital adequacy, credit risk, and management efficiency, can be evaluated in isolation to verify which corporate decisions should be aimed at maximizing the return to the shareholder. We also suggest examining whether the relationship patterns between variables have changed in the post-pandemic period compared to the pre-pandemic period, when there is available a sufficiently large temporal database.