1 Introduction

The 2007–9 global financial crisis brought to light several cases of financial accounting misreporting in the banking industry. The U.S. Securities and Exchange Commission (SEC)’s push to step up its policing of accounting fraud led to a surge of cases and investigations. For example, the First National Community Bancorp’s methodology for determining the amount of impairment on securities was found to not comply with the Generally Accepted Accounting Principles (GAAP), and, hence, the bank understated losses in 2009 and 2010 (Securities and Exchange Commission (SEC) 2015). Similarly, Bank of America Corp. failed to deduct certain realised losses when it calculated and reported its regulatory capital, meaning that capital was overstated in its SEC filings from 2009 to 2013 (Securities and Exchange Commission (SEC) 2014). In the same vein, Fifth Third Bancorp. failed to record substantial losses during the crisis by not properly considering a portion of its non-performing loan portfolio (Securities and Exchange Commission (SEC) 2013).

Auditors failed in several instances to raise the alarm to the authorities for considerable discrepancies in the books of banks. The Federal Deposit Insurance Corporation (FDIC) sued PricewaterhouseCoopers (PwC) for $1 billion for not detecting a massive accounting fraud that brought down Colonial Bank in 2009. PwC was blamed for missing huge holes in Colonial’s balance sheet without ever detecting the multibillion-dollar fraud at Taylor Bean & Whitaker Mortgage Corp., which was Colonial’s largest client. Furthermore, two KPMG auditors received suspensions for failing to scrutinise loan loss reserves at TierOne Bank, which also went bankrupt during the crisis. Along the same lines, just eight months prior to the demise of Lehman Brothers, Ernst and Young’s auditors remained silent about the repurchase transactions that significantly disguised the bank’s leverage.

This paper evaluates the financial statements of U.S. banks by employing a mathematical law which was established by Frank Benford (Benford 1938) to detect data manipulation. If an accounting figure is manipulated upwards to increase the digit in the first position by one, a higher than the anticipated proportion of low numbers and a lower than expected frequency of high numbers will occur in the second position. This describes an uncommon digital pattern that violates Benford’s Law and indicates an upward data manipulation; if the converse holds true, then data are subjected to downward manipulation. An upward manipulation pattern suggests that when an accounting figure starts with either the digit eight or nine it is more likely to be rounded up to a figure which has a one as the first digit. In the context of downward manipulation, the value of an accounting figure is lowered when its first digit is one to a figure which starts with the digit nine.

Despite substantial progress in the areas of financial misreporting and accounting fraud, existing methods and metrics have several deficiencies that limit their usefulness. The use of Benford’s Law is deemed as being superior to other traditional approaches as it overcomes some of the key concerns surrounding these approaches. Prior work has utilised Benford’s Law to identify accounting fraud and data manipulation in corporate firms (e.g., Alali and Romero 2013). Prior literature has also linked the use of Benford’s Law in the realm of financial audits (Durtschi et al. 2004), taxes (e.g., Nigrini 1996), macroeconomic indicators (e.g., Gonzales-Garcia and Pastor 2009), and interest rates (e.g., Ashton and Hudson 2008). However, no paper has ever applied the Law to identify irregularities in financial reporting in the banking industry, making this a key innovation of our study.

Banks are special entities from various perspectives and, as such, their accounting books require special attention. The banking sector is one of the most heavily regulated sectors in the economy and a key regulatory measure is a capital adequacy requirement that provides loss absorbency. Troubled banks have incentives to overstate equity and regulatory capital, particularly during economic downturns (e.g., Vyas 2011; Huizinga and Laeven 2012). Furthermore, since banks’ capital requirements are linked to the riskiness of their portfolios, bank accountants face incentives to overstate the quality of assets and understate the relevant risks and potential losses. Banks’ high leverage provides additional incentives to remain undercapitalised, since the benefits of issuing new equity primarily accrue to creditors, and also to make excessively risky investment decisions (Jensen and Meckling 1976). These moral hazard incentives may be exacerbated for larger banks that expect to be bailed out when insolvent. Moreover, banks are inherently opaque (Morgan 2002; Flannery et al. 2013) in lending decisions that are based on private information about borrowers and projects that is not available to those outside the bank (e.g., Diamond 1984). Additionally, in contrast to corporate firms, banks are engaged in a broad array of trading activities that produce non-interest income and which makes banks relatively more complex (Morgan 2002; Laeven 2013). In this context, accruals for banks reflect different considerations than those that drive accruals for corporate firms (Cohen et al. 2014). Lastly, since banks finance illiquid loans with liquid deposits, they have incentives to show to depositors and to other creditors that their portfolios are highly liquid.

To capture the complexity in the functions that banks perform as presented above and the likely manipulation in their relevant accounting figures, we shed light on a wide array of balance sheet and income statement variables used by U.S. authorities to monitor the performance and soundness of banks. We analyse the components of the CAMELS rating system that capture capital adequacy, asset quality, earnings strength, and the level of liquidity, which managers have inherent incentives to manipulate. This more comprehensive analysis distinguishes our paper from the bulk of the Benford’s Law literature that focuses on the manipulation of earnings and earnings-related figures of corporate firms (e.g., Carslaw 1988; Thomas 1989; Niskanen and Keloharju 2000; Van Caneghem 2002). The paper also differs from the banking literature that explores how managers use accounting discretion to manage earnings by employing various loan loss provisioning discretion models (e.g., Ahmed et al. 1999; Beatty et al. 2002; Cohen et al. 2014).Footnote 1

Poor accounting data quality and weak disclosure practices combined with possible data manipulation in the banking sector have contributed to the propagation and the prolongation of the recent crisis and can possibly plant the seeds for the next financial turmoil. That said, an additional innovation of our paper is that we test for data manipulation prior to the outbreak of the crisis as well as during the crisis. The crisis offers an excellent research environment that enables us to turn the spotlight on any discrepancies in financial reporting across banks in different financial conditions and to explore how Benford’s Law can be used to document those discrepancies.

We report several interesting results, which are all robust to a number of sensitivity and placebo tests. Banks utilise loan loss provisions to manipulate earnings and interest income upwards throughout the two examined periods regardless of their level of soundness. In the case of bailed-out banks, non-interest income is also found to be manipulated upwards in both periods. Together with loan loss provisions, failed and bailed-out banks resort to a downward manipulation of their allowance for loan losses and non-performing loans. Furthermore, manipulation is found to be more prevalent in distressed banks, which have stronger incentives to conceal their financial difficulties. Importantly, manipulation is both magnified during the crisis period and also expanded to affect regulatory capital. Overall, banks utilise data manipulation to disclose an artificially improved picture of their performance.

The rest of the paper proceeds as follows. Section 2 describes the theoretical underpinnings of Benford’s Law, presents the relevant literature, and clarifies how Benford’s Law addresses the limitations of some previous methods. The data and the tools we employ in our empirical analysis are described in Section 3. Section 4 presents our main empirical results and Section 5 performs a series of robustness checks, and then Section 6 concludes.

2 Benford’s Law

2.1 Background and Theoretical Underpinnings

Newcomb (1881) noted that the initial pages of a book of logarithmic tables were more worn than latter pages; similarly, numbers with the first digit of 1 were observed more often than those starting with 2, 3 and so on. On that basis, he developed a set of mathematical theorems to show that low numbers possess a larger probability of appearing in the first two digital positions within a number compared to high numbers (Nigrini 2012).

Almost fifty years later, Benford (1938) focused on the expected distribution of digits in data sets which are not contrived by man or machine. He formulated the expected frequencies for the first and the second positions in a number together with their combinations and found that, when the data are ranked from smallest to largest, they form a geometric sequence. The leading digits follow a logarithmic weak monotonic distribution, which is described by the following expressions:

$$ f(i)={\mathit{\log}}_{10}\left(1+{i}^{-1}\right), $$
(1)

and

$$ \sum_{i=0}^9f(i)=1, $$
(2)

where f(i) stands for the frequency of digit i implied by Benford’s Law being the first or other leading digit; i = 0, 1, 2,…, 9; and log10 is the base 10 logarithm. Apparently, the first digit cannot take the value of 0, unless the number under scrutiny is a decimal number. As shown in Table 1, digit 1 possesses a 30.10% likelihood of appearing in the first digital place within a number, while digit 9 has only a 4.58% likelihood of occurring in the first place. More generally, digit 1 occurs more often than any other digit in the first position of a number; the frequency of the remaining digits decreases with respect to the value of the digits.

Table 1 Benford’s expected digital frequencies

The expected digital frequencies for the first and the second places based on Benford’s distribution are graphically presented in Fig. 1.

Fig. 1
figure 1

Benford’s expected frequencies for the first and second digits. This Figure contains a graphical representation of the expected frequencies for the first digit (dashed line) and for the second digit (solid line) based on Benford’s distribution. The values on the vertical axis are the distribution probabilities of digit i as shown on the horizontal axis, where i=0, 1, 2,…, 9

Benford’s distribution of digits is an empirically observable phenomenon. If distributions are selected at random and random samples are taken from each of these distributions, then the significant digital frequencies of the combined samplings are expected to converge to Benford’s distribution, even though the individual distributions may not closely follow the Law. In the context of our research, this means that if the digital frequency in the bank accounting data departs from the expectations of Benford’s Law, then data is likely to have been manipulated.

2.2 Related Literature

Prior literature primarily focuses on the compliance of earnings and earnings-related figures with Benford’s Law in the corporate sector. Carslaw (1988) uses a sample of 220 firms in New Zealand to test whether managers tend to round income upwards when this is below some psychological threshold value. An abnormally high frequency of digit zero and an unusually low occurrence of digit nine are documented in the second position of the examined figures, providing support to the relevant hypothesis. In a similar vein, Thomas (1989) reports similar, though considerably smaller, deviations from the expected frequencies in the quarterly earnings of U.S. firms, whilst a reverse pattern with fewer zeros and more nines is reported for losses.

Tilden and Janes (2012) show that corporate accounting data conform to the Law under normal economic and financial conditions. However, some figures like the allowance for doubtful accounts and the net income fail to conform in periods surrounding recessions. Alali and Romero (2013) apply the Law on the books of a sample of U.S. corporate firms to detect any digital frequency discrepancies in their revenues, expenses, assets, and liabilities. The study of Lin and Wu (2014) uses the Law to capture earnings manipulation in the Taiwanese and the U.S. corporate sectors, whereas that of Lin et al. (2018) is solely focused on earnings manipulation in Taiwanese firms. Further research on earnings manipulation based on Benford’s Law has been conducted by Niskanen and Keloharju (2000), Van Caneghem (2002), and Skousen et al. (2004) for the Finnish, U.K., and Japanese corporate companies, respectively.

In a different context, Nigrini (1996) uses the Law to show that taxpayers lean towards understating real taxable income. Similarly, Nigrini and Mittermaier (1997) detect fraudulent data in tax payments and accounting statements. Durtschi et al. (2004) provide further support to the importance of the Law in helping auditors to increase their ability to detect fraudulent practices.Footnote 2

2.3 Advantages

Most existing models for detecting accounting manipulation (Healy (1985) model; Jones model (Jones 1991); modified Jones model (Dechow et al. 1995; Kothari et al. 2005); McNichols (2000) model; Dechow and Dichev (2002) model) are designed to detect earnings management by partitioning accruals into their discretionary and non-discretionary components (Dechow et al. 2012). They focus on the discretion of managers that fall within the relevant accounting standards. Furthermore, as Dechow et al. (2012) point out, these models are implemented through the use of a firm-specific estimation period during which no systematic earnings management is assumed. Unlike Benford’s Law, they cannot capture systematic fraud or accounting data manipulation.

Many models rely on working capital accruals, which are, however, less meaningful for banks. Therefore, studies (e.g., Healy 1985; Dechow and Dichev 2002; Dechow et al. 2012) exclude the latter institutions from their analysis. Moreover, they typically analyse firm-specific characteristics to estimate the managed portion of reported earnings, which may incidentally capture the individual characteristics of the examined entities beyond reporting the quality of the accounting data. In contrast, the distribution of the first digits upon which Benford’s Law relies is independent of firm-level characteristics (Dechow et al. 2010; Amiram et al. 2015). Benford’s Law hinges upon the distributional properties of the reported accounting figures and, as such, it measures the quality of the data directly, rather than some variables correlated with the examined figures (Amiram et al. 2015). This is crucial in the context of our research as banks follow different business models.

Benford’s distribution is very simple to calculate, is scale-independent and fits every currency (Amiram et al. 2015). Also, in contrast to accrual models, it does not require aggregated, large time series, or cross-sectional data (Dechow et al. 1995; Dechow et al. 2011). Additionally, Benford’s Law constitutes a non-econometric approach in that it does not rely on the estimation of a model, so that statistical inference cannot be biased by unobserved characteristics, omitted variables, or any changes in the underlying model parameters that cause the estimated coefficients to deviate from the actual ones (Kothari et al. 2005; Dechow et al. 2010).Footnote 3 Benford’s Law neither requires forward-looking information such as future realisations of cash flows (e.g., Dechow and Dichev 2002), nor relies on returns or price information that may give birth to selection bias (Amiram et al. 2015). Also, if realisations of forward-looking information are correlated with current information, then their use may create inference bias (Amiram et al. 2015).

Other studies develop composite indices instead of employing accrual-based models to estimate earnings manipulation. Beneish’s M-Score (Beneish 1999; Beneish et al. 2013) is constructed as a linear combination of firm-level performance variables such as gross margin and sales growth. As such, it is difficult to conclude about errors that are separate from firm performance (Amiram et al. 2015). Importantly, because banks have highly leveraged capital structures, the M-score, which keys in on leverage, is not applicable to banks and, hence, removed from the analyses of Beneish (1999) and Beneish et al. (2013). Importantly, unlike Beneish’s M-Score, Benford’s Law does not require the ex ante identification of managerial incentives to capture data manipulation. Dechow et al. (2011) develop the F-Score, which is used to estimate the likelihood of earnings misstatement. However, it cannot be applied to banks due to many accrual-related variables used in its construction.

3 Empirical Analysis

3.1 Data Period

We utilise quarterly data from the first quarter of 2003 (2003q1) to the last quarter of 2012 (2012q4) when the banking crisis in the U.S. came to a halt in that the number of failed or bailed out banks shrank significantly in 2013 and thereafter. The data period is divided into two periods. The former period includes the quarters prior to the outbreak of the crisis (2003q1-2007q3), that is before September 2007 when the TED spread widened to almost 200 basis points relative to its historically stable range of 10–50 basis points.Footnote 4 This period follows the development and the implementation of the Sarbanes-Oxley Act in mid-2002 which enhanced scrutiny of financial reporting in the U.S.. The latter period extends from 2007q4 to 2012q4 and encompasses the crisis quarters in which financial turbulence, uncertainty, and distress prevailed in the economy. This was the period when several financial reporting misstatements and frauds in the banking industry occurred (see, e.g., Securities and Exchange Commission (SEC) 2013, 2014, 2015).

3.2 Data Set

We focus on the U.S. commercial and savings banking institutions that file a Report on Condition and Income (also known as a Call Report). Sample banks are classified into distressed and non-distressed banks. We define distressed banks as those that either filed for bankruptcy or were bailed out during the crisis. For the period starting from 2007q4 and extending to 2012q4, there were recorded 449 bank failures (396 and 53 failures of commercial and savings banks, respectively) based on FDIC data. To construct the sample of bailed-out banks, we identify all banks that received financial aid via the Capital Purchase Program (CPP) of the Troubled Asset Relief Program (TARP). The TARP/CPP was the largest U.S. government bailout programme in history. It was established in 2008 and authorised the U.S. Treasury to inject capital into the banking system by purchasing up to $250 billion of preferred stocks and equity warrants from the undercapitalised banks. Based on U.S. Treasury data, we identify 731 banks that received TARP/CPP funding via their parent holding companies (HCs).Footnote 5 We add to this the 93 banks which are not linked to some HC to construct the final sample of 824 bailed-out banks. As regards the group of non-distressed banks, this is composed of the institutions that stayed afloat as going concern entities in that they neither failed, nor received a TARP/CPP assistance, nor merged with or acquired by some other institution throughout the entire sample period. Overall, we identify 6302 non-distressed banks.

3.3 Variables

The bank accounting data that we test is based on that used by the Uniform Financial Rating System, known as CAMELS.Footnote 6 CAMELS are not publicly available and cannot be exactly replicated with public information, so we follow the literature (see, e.g., Stojanovic et al. 2008; Duchin and Sosyura 2012) to construct a set of accounting variables that reflect Capital adequacy, Asset quality, Earnings strength, and Liquidity. We do not account for Sensitivity to market risk because this component is captured by market data, nor do we consider Management expertise because supervisors likely rate this component on qualitative, subjective factors.Footnote 7

We measure capital adequacy by using the core regulatory capital (TIER 1); asset quality is measured by loan loss provisions (LLP); earnings strength is based on total interest income (INTINC), and total non-interest income (NINTIC); and liquidity is captured by cash and balances due from depository institutions (CASH). The construction of variables relies on quarterly data collected from Call Reports as found in the website of the Federal Reserve Bank of Chicago and that of the Federal Financial Institutions Examination Council (FFIEC) Central Data Repository’s Public Data Distribution. All variables and their data sources are described in Table 2.

Table 2 Bank accounting variables and data sources

3.4 Digital Analysis

Seminal studies (e.g., Carslaw 1988; Thomas 1989) as well as more recent studies (Niskanen and Keloharju 2000; Van Caneghem 2002) on manipulation based on Benford’s Law primarily focus on the second-from-the-left position of the examined accounting figures. The main reason for this is that manipulation can be independent of the first position. If manipulation is detected in the second position, then its pattern can be identified by applying the Law to the first position.

We compare the actual with the expected frequencies predicted by Benford’s Law of the digits zero through nine appearing in the second position of the variables of interest. If a variable is manipulated upwards to increase the digit in the first position by one, a higher than the anticipated proportion of low numbers (mostly zeros or ones) and a lower than expected frequency of high numbers (mostly nines or eights) will occur in the second position. This describes an uncommon digital pattern that violates Benford’s Law and indicates an upward data manipulation. If the reverse pattern is observed, i.e., more high digits and fewer low digits than expected appear in the second position, then data are suspect for a downward manipulation. A positive deviation implies that the actual proportions exceed the expected ones, while a negative deviation shows that the converse holds true.

Following the literature (e.g., Thomas 1989; Nigrini 1996; Durtschi et al. 2004), we resort to the normally distributed z-statistic to examine the statistical significance of the deviations in the actual and the expected proportions:

$$ {z}_i=\frac{\mid {p}_0-{p}_i\mid -\left(\frac{1}{2n}\right)}{s_i}, $$
(3)

where zi is the z-statistic for digit i where i = 0, 1, 2,…, 9; p0 denotes the actual (observed) proportion; pi is the expected proportion based on Benford’s Law; n is the number of observations of the examined variable; the term \( \frac{1}{2n} \) is the Yates’ continuity correction term that is applied only when its value is smaller than that of |p0 − pi| to bring normal and binomial probability curves into close agreement; and si is the standard deviation of digit i given by:

$$ {s}_i={\left[{p}_i\ \left(\left(1-{p}_i\right)/n\right)\ \right]}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} $$
(4)

The z-statistic tests the null hypothesis that the actual proportion does not statistically differ from the expected proportion based on Benford’s Law. As the difference between p0 and pi increases in Eq. (3), the z-statistic becomes larger. A z-statistic of: 2.57 indicates a p value of 0.01; 1.96 indicates a p value of 0.05; and 1.64 suggests a p value of 0.10.

An extension of the z-statistic that tests one digit at a time is the chi-square test, which is a ‘goodness-of-fit’ test conducted over all nine digits of the second position of the variables under scrutiny (see, e.g., Carslaw 1988; Thomas 1989; Durtschi et al. 2004). The chi-square test explores whether the observed distribution significantly differs from the expected distribution. If the chi-square test rejects the null hypothesis that the probability of all digits conforms to the expected distribution under the Law, then this is a strong signal for data manipulation. The value of chi-square is determined as follows:

$$ {\chi}_{(9)}^2=n\sum_{i=0}^9\frac{{\left({\hat{\theta}}_i-f(i)\right)}^2}{f(i)}, $$
(5)

where \( {\hat{\theta}}_i \)is the observed frequency of digit i; f(i) stands for the frequency of digit i implied by the Law as noted under Eqs. (1) and (2); n is the number of observations for the examined variable. The conformity of the entire distribution is tested, implying that the results are summed up for all digits i = 0, 1, 2,…, 9. As we focus on the second position, we have nine degrees of freedom, i.e.,\( {\chi}_{(9)}^2 \). The 10%, 5%, and 1% critical values for \( {\chi}_{(9)}^2 \) are 14.68, 16.92, and 21.97, respectively.

The digit-by-digit analysis based on the z-statistic and the variable-by-variable chi-square test are complementary. The former has a discriminatory character in that it draws attention to certain digits with peaks. It is utilised to reduce the probability of Type I and II Errors. A Type I Error occurs when a variable is not manipulated, but it is signaled as it underlies a human intervention. A Type II Error occurs when a variable that actually underlies some portion of human intervention is signaled as not being manipulated. The latter analysis investigates whether the behaviour of a variable as a whole warrants further examination.

4 Results

The results of the digital analysis for the second position based on the z-statistic and the chi-square tests are presented in Tables 3 and 4 for the pre- and the crisis periods, respectively. For ease of comprehension, we provide graphical representations of the results in Figs. 2 and 3.

Table 3 Digital analysis of the second position over the pre-crisis period
Table 4 Digital analysis of the second position over the crisis period
Fig. 2
figure 2

Graphical representation of the results of digital analysis in the pre-crisis period. Panels A, B, C, and D provide graphical representations of the results of digital analysis for the second position in the manipulated accounting variables (TIER1, LLP, INTINC, and NINTINC, respectively) in the pre-crisis period. The values on the vertical axis are the distribution probabilities of digit i (i=0, 1, 2,…, 9) as shown on the horizontal axis

Fig. 3
figure 3

Graphical representation of the results of digital analysis in the crisis period. Panels A, B, C, and D provide graphical representations of the results of digital analysis for the second position in the manipulated accounting variables (TIER1, LLP, INTINC, and NINTINC, respectively) in the crisis period. The values on the vertical axis are the distribution probabilities of digit i (i=0, 1, 2,…, 9) as shown on the horizontal axis

4.1 The Pre-Crisis Period

The actual proportion of zeros appearing in the second position of interest income (INTINC) for non-distressed banks exceeds the expected proportion by 5.56%. The reported deviation is statistically significant at the 5% level. Similarly, the actual proportion of ones significantly exceeds the expected proportion by 3.95%. If we apply the same procedure to the remaining numbers (i.e., two through nine) appearing in the second position of INTINC, we will observe that eights and nines produce actual proportions which are significantly (at 5%) lower than the anticipated proportions. More specifically, deviations are equal to −2.56% and − 7.18%, respectively. The chi-square test suggests a rejection of the hypothesis that the INTINC data are generated by Benford’s distribution. Overall, results demonstrate a clear indication of data manipulation: banks which remained afloat, appeared to round INTINC upwards in the years preceding the crisis to signal a higher profitability and more robust interest-bearing activities. This group of banks also manipulated loan loss provisions downwards as shown in Fig. 2 (Panel B), confirming earlier research (e.g., Greenawalt and Sinkey Jr. 1988; Kanagaretnam et al. 2004) which indicates that bank managers largely use provisions for earnings management. Regarding the remaining three variables (TIER1, NINTINC, CASH), no significant deviations are reported.

Results reveal the occurrence of large positive (negative) and significant proportional deviations in digits zero and one (eight and nine) of INTINC for failed banks. Deviations are larger in magnitude and more statistically significant if compared to the deviations observed for the non-distressed banks as shown in Table 5 (Panel C) that presents the significance tests of the deviations for digit 0 among the three types of banks across the two time periods based on z-statistic.Footnote 8 Moreover, a downward bias is reported in LLP as Fig. 2 (Panel B) displays. This means that banks which later filed for bankruptcy appeared to be systematically manipulating LLP downwards in the years prior to the crisis. The deviations in LLP are both numerically larger and statistically significant compared to those reported for the non-distressed banks as Table 5 (Panel B) demonstrates. The upward manipulation of INTINC as well as the downward manipulation of LLP are both confirmed by the results of the chi-square test. Any discrepancies between the actual and the expected frequencies for the remaining three variables (TIER1, NINTINC, CASH) for failed banks are not significant like for their non-distressed peers.

Table 5 Statistical significance of the reported deviations from Benford’s Law for digit zero

Possible upward data manipulation of interest income is also documented for the group of bailed-out banks. The actual proportion of zeros appearing in the second position of INTINC exceeds the expected proportion by 7.32%, and the actual proportion of ones exceeds the expected proportion by 3.90%. Deviations are significant at the 1% and 5% levels, respectively. As regards eights and nines, these are linked to actual proportions which are significantly lower from the anticipated proportions. Interestingly, bailed-out banks are found to manipulate non-interest income (NINTINC) upwards and loan loss provisions (LLP) downwards as shown in Fig. 2 (Panels B and D, respectively). A goodness-of-fit test produces significant results for all three manipulated variables (INTINC, NINTINC, LLP). Importantly, deviations in LLP and INTINC are both numerically larger and statistically different compared to those reported for non-distressed banks as Table 5 (Panels B and C, respectively) demonstrates.

4.2 The Crisis Period

Table 4 reports evidence of a strong upward data manipulation in interest income for all three groups of banks. The manipulation levels of INTINC are higher compared to those documented prior to the crisis: the incidence of zeros and ones in the second position of INTINC increased during the crisis, and, at the same time, the occurrence of eights and nines is lowered. The chi-square test provides significant results either at 1 or 5% levels for INTINC across the three groups of banks. Distressed (failed and bailed-out) banks manipulate INTINC to a greater extent compared to their non-distressed counterparts, similar to the pre-crisis period. The relevant differences are significant at the 5% level as displayed in Table 5 (Panel C). In addition, bailed-out institutions exert an upward manipulative influence on non-interest income (NINTINC) as in the pre-crisis era. The chi-square test is significant at the 1% level, which suggests the rejection of the null of Benford’s distribution.

As regards the quality of bank assets, the values of z-statistic in Table 4 reveal significantly negative deviations in digit zero and significantly positive deviations in digit nine for loan loss provisions (LLP) for all three groups of banks. The overall level of downward manipulation is significantly different for the two groups of distressed banks compared to their non-distressed counterparts (see Table 5, Panel B).

Remarkably, an excess of zeros and a shortage of nines in the second position of TIER1 is reported for all three groups of banks. This constitutes a strong indication for upward data manipulation of regulatory capital as demonstrated in Fig. 3 (Panel A). The chi-square test confirms the statistical validity of all relevant deviations. Notably, the deviations between the distressed and the non-distressed (i.e., failed and bailed-out) banks are significant at 5%, whereas those between the failed and bailed-out banks are not significant (Table 5, Panel A). Accordingly, banks, and especially the distressed ones, appear to act in accordance with Tilden and Janes (2012)’s findings of increased financial statement manipulation during recessionary times. Banks manipulate regulatory capital upwards by holding back on loan loss provisioning. Such a strategic behaviour, which is in line with the findings of Huizinga and Laeven (2012), enables banks with impaired assets to satisfy capital requirements even though their true health is problematic.

4.3 Across the Two Periods

All banks utilise loan loss provisions throughout the two periods to manipulate interest income upwards. In the case of bailed-out banks, the majority of which are large and universal institutions with a broad range of financial activities, non-interest income is also found to have been manipulated upwards in both periods.Footnote 9 Banks’ performance-based remuneration practices are widely considered to create strong managerial incentives to manipulate income. We provide solid evidence that data manipulation is more prevalent in distressed banks across the two periods. Weak banks tend to manage interest and non-interest income upwards by delaying loan loss provisions (Liu and Ryan 2006). Although distressed banks may be more closely monitored by regulators, they may be more desperate in reporting performance standards necessary for their survival.

Manipulation in asset quality (LLP) and interest income (INTINC) is both numerically larger in the crisis period and statistically higher compared to the pre-crisis period (Table 5, Panels B and C, respectively). Similarly, the manipulation in NINTINC that we document for bailed-out banks in the crisis period is stronger compared to that reported in the pre-crisis period; the relevant difference is significant at 5% (Table 5, Panel D). Importantly, manipulation is expanded to affect regulatory capital. The reported differences in TIER1 between the two periods are highly significant for all banks (Table 5, Panel A). Since the prospect of government bailouts creates incentives for distressed banks to take excessive risks (Kane 2018), a crucial prerequisite to obtain TARP/CPP assistance after the outbreak of the crisis was that applicant banks had to prove that they were viable. Hence, banks were incentivised to show to Federal assessors, among others, higher income streams and cash flows than the actual ones.

4.4 Patterns of Manipulation

We now examine the patterns of manipulation by applying Benford’s Law to the first position of each of the relevant variables, INTINC, NINTINC, LLP, and TIER1. The results for the pre-crisis period are presented in Table 6 and are very similar to those obtained for the crisis period.Footnote 10 The actual proportion of ones which appear in the first position of INTINC and TIER1 is higher than anticipated for all banks; eights and (mostly) nines, on the other hand, are observed less often than expected. The same pattern is detected for NINTINC for the bailed-out banks. As regards LLP, its actual proportion of ones is lower than anticipated for the three groups of banks, whereas that of nines is higher than expected. The aforementioned deviations are significant at the 1% and 5% significance levels based on z-statistic. A chi-square test is also significant at the same levels.

Table 6 Digital analysis of the first position over the pre-crisis period

Our results demonstrate that the data manipulation pattern is common among the examined variables. The documented upward pattern suggests that when an accounting figure starts with either the digit eight or nine it is more likely to be rounded up to a figure which has a one as the first digit. To give an example, a regulatory capital of $894 million is likely to be rounded up to $1.012 billion.Footnote 11 In the context of a downward manipulation, on the other hand, the reported pattern shows that banks tend to decrease the value of an accounting figure when its first digit is one to a figure which starts with the digit nine. For instance, loan loss provisions of $1.085 million are manipulated downwards to $991,062. Interestingly, neither the level of soundness of banks under scrutiny (i.e., distressed vs non-distressed banks), nor the occurrence of the crisis alter the documented pattern of manipulation.

Results, which are consistent with the results obtained by studies like Carslaw (1988), Thomas (1989), and Van Caneghem (2002) suggest that banks exercise manipulation to increase the magnitude of balance sheet (TIER1) and income statement variables (INTINC and NINTINC) when the level of these variables is slightly below a round number. By the same token, banks manipulate loan loss provisions downwards when their level is slightly above a round number. That is, banks utilise data manipulation to conceal their financial difficulties and present an artificially improved view of their performance to outsiders.

5 Robustness Analysis

We carry out a number of different tests to validate the robustness of our findings. We start by testing the sensitivity of our results to a set of alternative CAMELS components. The book equity capital (EQUITY) is used to measure capital adequacy; asset quality is measured by non-performing loans (NPL) and by allowances for loan losses (ALLOW); earnings strength is based on total earning assets (EARNAS); and liquidity is captured by federal funds purchased and securities sold under agreements to repurchase (REPOS). Variables and data sources are described in Table 2.

The results for the pre-crisis period in Table 7 reveal the occurrence of positive (negative) and significant proportional deviations in digits zero and one (eight and nine) for EARNAS across the three groups of banks. A downward bias is reported in ALLOW for the groups of distressed banks as there are more nines and fewer zeros than anticipated in the second digit; deviations are significant at the 5% level. The goodness-of-fit chi-square test produces significant results for both EARNAS and ALLOW. No indication of manipulation is found for bank capital (EQUITY) and liquidity (REPOS) in the pre-crisis period, which corroborates our baseline results.

Table 7 Robustness test: Digital analysis of the second position over the pre-crisis period

To the extent that loan loss allowances are included in the regulatory capital of banks, managers have incentives to manipulate ALLOW downwards with the purpose of demonstrating that a strong capital cushion has been set aside to absorb shocks in a financial turmoil. Under the Financial Accounting Standards No. 5 (FASB 1975) entitled ‘Accounting for Contingencies’, when credit losses are likely and can be reasonably estimated, an expense called loan loss provisions and a contra-asset (to earning assets outstanding) called loan loss allowances should be recorded on banks’ accounting books. That is, by manipulating loan loss provisions downwards as documented in our baseline analysis, banks are inclined to follow the same or similar strategies with loan loss allowances, and, by doing so, are entitled to artificially increase the volume of earning assets.

Turning to the crisis period, the results in Table 8 reveal an upward manipulation of EARNAS for all three groups of banks; manipulation is stronger compared to the pre-crisis period. Regarding asset quality, the values of z-statistic demonstrate significantly negative deviations in digit zero and significantly positive deviations in digit nine for both ALLOW and NPL for the two groups of distressed banks. The chi-square test confirms the statistical validity of these deviations. The downward manipulation in NPL can be explained by the direct relation that holds between non-performing loans and loan loss provisions: higher levels of non-performing loans imply troubles in the loan portfolio of banks and these troubles are normally reflected in higher loss provisions as documented in our baseline analysis (Kanagaretnam et al. 2010). Interestingly, we find no indication of manipulation in EQUITY. This means that banks are mainly interested in appearing to be sufficiently capitalised by solely manipulating their regulatory capital (TIER1) upwards. Lastly, by applying Benford’s Law to the first position of EARNAS, ALLOW, and NPL, we can corroborate the rather unnoticeable manipulation patterns detected in our baseline analysis.

Table 8 Robustness test: Digital analysis of the second position over the crisis period

As an additional robustness test, we consider Nigrini (1996)‘s Distortion Factor (DF) model that indicates whether data are overstated or understated and that also estimates the extent of manipulation. The DF model compares the mean of the actual numbers in a data set and the mean of the expected numbers based on Benford’s Law. Since there is no unique mean for the numbers contained in a data set which closely approximates the Law because such a data set consists of relatively large or small numbers, Nigrini (1996) suggests moving the decimal point of each actual number so that each number would fall into the interval [10,100). More concretely, each number that is below 10 is expanded to a number within the range [10,100). Similarly, each number that is larger than 100 is collapsed to a number within the range [10,100). For example, the number 4.29 expands to 42.9, while the number 1040 collapses to 10.4. A comparison is then made between the mean of the set of actual numbers scaled to the [10,100) range and the mean of the set of the expected numbers that conform to the Law.

The Actual Mean (AM) of the n collapsed or expanded data is:

$$ AM=\frac{\sum X}{n}, $$
(6)

where X stands for the collapsed or the expanded data values, and n is the number of observations of the examined variable. The Expected Mean (EM) of Benford’s distribution is:

$$ EM=\frac{90}{n\left({10}^{\frac{1}{n}}-1\right)} $$
(7)

We can now calculate DF as follows:

$$ DF=\frac{\left( AM- EM\right)100}{EM} $$
(8)

Equation (8) reflects the average percentage manipulation of the examined data. When DF is positive (negative), an upward (downward) manipulation is detected. Since AM and EM are the means of n random variables, the distribution of DF approaches the normal distribution according to the central limit theorem. Hence, the z-statistic that tests the null hypothesis that the AM equals the EM can be computed for a relatively large n.Footnote 12

To enhance the validity of this robustness test, we account for possible selection bias in our sample of banks. Therefore, we exclude the distressed banks that were not expected to be let to fail as there were systemically important. The managers of those banks had, in principle, little incentives to resort to manipulation. On the same day that the U.S. Treasury launched TARP/CPP, nine of the largest banks which together accounted for approximately 55% of the U.S. banks’ assets were ‘arm twisted’ by authorities to participate in the programme. Those banks were: Bank of America, Citigroup, JP Morgan Chase, Wells Fargo, Morgan Stanley, Goldman Sachs, Bank of New York Mellon, State Street, and Merrill Lynch. We also exclude Washington Mutual since its largest subsidiary was a thrift institution and not a commercial or a savings bank. By excluding these banks from our sample, we remove the impact of extreme values and outliers that may have an effect on the observed frequencies and the relevant digital analysis.

Table 9 presents the DF values in percentage terms for all ten variables under scrutiny for the three groups of banks over the two periods. The reported signs and the magnitude of DF confirm the results obtained in our baseline analysis. Total earning assets (EARNAS) are significantly distorted upwards by 7.41%, 6.93%, and 6.53% for the non-distressed, failed, and bailed-out banks respectively in the pre-crisis period. The relevant percentages in the crisis years are equal to 7.82, 9.30, and 8.81 for the three banking groups, respectively. A very similar distortion pattern is documented for total interest income (INTINC). Furthermore, a significant upward distortion of 7.42% in the pre-crisis period and 8.94% in the crisis period is documented for the non-interest income (NINTINC) of bailed-out banks, which is also in line with our baseline findings. We further confirm that manipulation is magnified and expanded after the outbreak of the crisis.Footnote 13 All banks are found to be involved in an upward manipulation of the core regulatory capital during the crisis period. Indeed, TIER1 is significantly manipulated upwards, where the extent of manipulation is higher for the distressed institutions. Moreover, loan loss allowances (ALLOW) are significantly distorted downwards by failed and bailed-out banks in the pre-crisis period (−5.94% and − 6.20%, respectively). The magnitude of distortion is enhanced in the crisis period (−8.53% and − 7.64%, respectively). Loan loss provisions (LLP) are also found to be distorted downwards by all banks across the two time periods. The reported distortion is significant at the 5% level and is stronger in the crisis period. As regards non-performing loans (NPL), these are manipulated downwards by the set of distressed banks not only in the crisis period (as evidenced the baseline analysis), but also in the pre-crisis period.

Table 9 Distortion factor model

To further enhance the robustness of our results, we confirm the validity of the crisis effect and that of the effect of distressed banks by carrying out two placebo tests. Lastly, the ability of Benford’s Law to predict data manipulation is corroborated by an out-of-sample prediction analysis we conduct.Footnote 14

6 Conclusions

Manipulation may transmit noise to the operation of the banking sector. It distorts the expectations of shareholders and other investors about the individual bank valuations and the overall financial conditions in the sector. It also misleads authorities in their task to identify and address any problems in the operation of banks and to unravel any sources of instability for the whole industry. We utilise Benford’s Law to test whether and to what extent a set of fundamental balance sheet and income statement data were manipulated prior to and also after the crisis.

Several interesting findings are reported in our baseline analysis and are corroborated by robustness checks. Banks, regardless of their financial condition, appear to utilise loan loss provisions to manipulate earnings and interest income upwards throughout the two periods. For bailed-out banks, non-interest income, which is a key income component in their business model, is also manipulated upwards in both periods. Furthermore, distressed (bailed out and failed) banks resort to the downward manipulation of allowances for loan losses and non-performing loans and this, in combination with the delay in loan loss provisions, enables them to artificially increase their reported earnings. As such, manipulation is more evident in distressed institutions. Moreover, manipulation is magnified in the crisis period for all the examined banks. It also encompasses regulatory capital.

In sum, banks’ data manipulation decreases the reliability of accounting information, erodes confidence, and undermines their credibility. Manipulation yields a distorted view of the financial conditions and the health of banks, which may have the effect of increasing regulatory forbearance. It is, therefore, crucial for authorities and other outsiders to find ways to reduce manipulation. Our results call for a more in-depth evaluation of the quality of the bank accounting information by applying a Benford’s Law-type analysis, which can assist authorities in deterring manipulation both under normal financial conditions, and during financial debacles when the phenomenon is exaggerated.