1 Introduction

Prior literature shows that stock markets overprice total and/or discretionary accrual component of earnings (see, for example, Sloan 1996; Teoh et al. 1998a, b; Xie 2001; Desai et al. 2004; Iqbal et al. 2009; Iqbal and Strong 2010; Wu et al. 2010; Simlai 2021). Similarly, accruals, especially the discretionary component over which managers have the discretion to manipulate, are widely used as a proxy for earnings management. Hence, the mispricing of accruals could be attributable to investors failing to fully reflect on the ‘true’ earnings that a manager knows but does not truthfully reveal to the market.

One can arguably question the intuition of the above story. Earnings management, of course, does not happen for no reason. Assuming managers could exercise their discretion to successfully hide the ‘true’ earnings and subsequently ‘fool’ the market, the manipulation of accounts would have happened in a context which the managers could not conceal, be it a personal motivation, a benefit to the shareholders, a pressure, or suitable opportunity for managing earnings. The market might not observe the actual earnings manipulation, but they could sense the existence of such manipulation if the general context surrounding the firm is susceptible. Since it is hard for managers to conceal the general context surrounding their firms, if investors are still ‘fooled’ by earnings management, they must have underreacted to the information contained in the context. This paper develops an empirical proxy for the context of earnings management and empirically tests the hypothesis that markets underreact to that context.

The main objective of this paper is to test whether the context of earnings management is mispriced. To achieve this, we begin by developing a model that captures the context in which earnings management is likely to occur. The model accumulates various signals which are extracted mainly from annual financial statements. The model generates a composite score, namely ESCORE, which accumulates 15 individual binary scores based on the rich extant literature on likely signals of earnings management. We group these signals into four broad categories: (i) the incentives for earnings management, (ii) the pressures on managers, (iii) the constraints to manage earnings, and (iv) the firm’s innate characteristics. Next, using a sample of UK listed firms during the period 1995–2011, we test the effectiveness of ESCORE in capturing the context of earnings management by applying existing models of earnings management from the literature. We find that high-ESCORE firms engage in (both accruals and real) earnings management in larger magnitude and are more likely to engage in aggressive (i.e. income increasing) earnings management practices. We also find that firms which are required to restate their annual accounts by the Financial Reporting Review Panel (FRRP henceforth) have higher ESCORE in the year to which the restatement is related. Furthermore, we present evidence showing that the distribution of the first digits of figures on financial statements released by firms with higher ESCORE deviate significantly from the distribution expected by the Benford’s law. According to this law, the first digits of accounting figures, like many other natural datasets, would be distributed in a way that makes smaller values have higher probability to occur (Amiram et al. 2015). Our evidence tends to suggest that higher ESCORE is associated with more anomalous distribution of the first digits of figures on financial statements, a sign suggesting the presence of earnings management. Having established that ESCORE is capable of capturing the context in which earnings management is more likely to occur, we further investigate whether ESCORE could predict one-year-ahead stock returns. The results show that a zero-investment hedge portfolio that takes long position in low ESCORE stocks and short position in high ESCORE stocks would earn an average abnormal return of 1.37% per month after adjusting for market, size, book-to-market and momentum factors in up to one year after portfolio formation. In multivariate regressions, ESCORE is negatively and significantly related to one-year-ahead buy-and-hold returns after controlling for other existing market ‘anomalies’, including the mispricing of discretionary accruals. The results are robust across different portfolio weighting schemes and models to estimate abnormal returns. We also show, using US data, that the main conclusions of the paper are generalizable to the US market. Overall, the evidence strongly suggests that the context of earnings management can be used to predict future stock returns.

This paper contributes to the literature in several ways. First, the ESCORE is an alternative model to detect earnings management which accumulates the signals of earnings management. Although most existing studies in the earning management detection area are ‘contextual’ in nature with attempts to link one or some characteristics of the context to (often a particular type of) earnings management, none has accumulated those signals to construct an index which could capture the ‘general context’ which is associated with various earnings management strategies as the ESCORE does. For example, the MSCORE (Beneish 1999) and FSCORE (Dechow et al. 2011) are both constructed based on various financial statement information, but they are developed to specifically detect violations of generally accepted accounting principles (GAAP hereafter) and earnings restatements, respectively. By design, therefore, those indices are silent about earnings management strategies which are within GAAP, such as accruals and real earnings management. The ESCORE, on the other hand, could infer the probability of earnings management, both accruals and real earnings management, and even violations of financial reporting rules which result in restatements or errors. The model is, hence, particularly useful for subsequent studies which aim to detect earnings management but do not make a prediction about which methods have been used to manipulate earnings or what motives are behind such manipulation. Second and most important, we contribute new evidence to the ‘market anomalies’ literature showing that not only the market misprices earnings management, it also does not fully appreciate the information contained in the context surrounding such manipulation. Although previous studies document that the market under-reacts to fundamental-based composite scores which accumulate individual signals (e.g. Piotroski 2000; Mohanram 2005), our evidence clearly makes a significant incremental contribution. For example, Piotroski’s FSCORE and Mohanram’s MSCORE are developed for the particular settings of high and low book-to-market firms, respectively, and the selection of signals to create the composite scores is deliberately to pick up signals related to financial strengths. Therefore, while the abnormal returns earned by Piotroski’s and Mohanram’s models might suggest that the market under-reacts to firm’s financial strength, it does not tell us how the market reacts to earnings management. Beneish et al. (2013) show that the MSCORE model, designed to detect firms which have been charged with or publicly admit to earnings manipulations, could also predict future returns. However, we develop ESCORE by selecting signals which are related to high likelihood of earnings management, regardless of whether it has been detected or not, and could be applied to any firm. Additionally, our evidence of abnormal returns earned by an ESCORE-based trading strategy suggests that the market also misprices another dimension of fundamental information which is related to the context of earnings management.

The rest of the paper is organised as follows. Section 2 reviews prior literature and develops the testable hypothesis. Section 3 presents the UK institutional settings and explains sample selection procedures. Section 4 outlines main methodologies employed in the paper. Section 5 presents and discusses the results. Section 6 provides some concluding remarks.

2 Related literature and hypothesis

The existing literature offers a handful number of earnings management detection models. The most popular method measures discretionary accruals as the deviation of actual accruals from a ‘normal’ level of accruals estimated using some firm-specific characteristics (Jones 1991; Dechow et al. 1995; Peasnell et al. 2000). The discretionary accruals model helps to detect one type of earnings management, namely managers exercise their discretion over accounting methods to influence reported earnings. Other researchers (Roychowdhury 2006; Athanasakou et al. 2009, 2011; Gunny 2010; Zang 2012; Sakaki et al. 2017) argue that to change reported earnings, managers do not necessarily resort to playing around with accounting methods and estimations, rather they could change real operation decisions, such as sales policies, production level, discretionary expense spending (e.g. advertising, R&D), etc. Such real earnings management has become increasingly more popular given the stricter financial reporting regulations (Cohen et al. 2008). To detect real operation management, the existing literature normally measures the deviation of the actual level of real activities with the expected level derived using some firm-specific information (Roychowdhury 2006; Srivastava 2019). Besides the aforementioned two methods, there are other models that detect other types of earnings management, such as timing of asset sales, classificatory shift, earnings guidance, etc. (Athanasakou et al. 2009, 2011; Gunny 2010; Twedt 2016).

Another strand of the literature develop earnings management detection models which are based on a combination of individual signals or firm characteristics. Beneish (1997), for example, develops a model, based on twelve signals which may reveal managerial incentives, to identify GAAP violators from accruals aggressors. Beneish (1999) provides an accounting-based index which could help to assess the likelihood of earnings overstatement. Dechow et al. (2011) develop a model, namely FSCORE, which can help predict the likelihood of earnings restatement. They start with an analysis of the characteristics of restated firms and employ a logistic regression to estimate the relation between firm’s characteristics and the likelihood of misstatement. FSCORE is used as a ‘barometer’ for financial statement users to quickly and timely assess the likelihood of earnings misstatements. The models developed by Beneish (1997, 1999) and Dechow et al. (2011) are more practical and offer advancements to the earlier models (such as Jones 1991; Dechow et al. 1995, etc.) because they are validated using ex-post indicators of earnings management which typically have very low Type I error rate (Dechow et al. 2010). However, these models are not entirely free from limitations. One issue is that these studies focus on firms that are subject to enforcements by an authority, such as the Securities and Exchange Commission (SEC hereafter), and are typically large since government agents like SEC would normally aim to maximise public benefits given its constrained budget. In addition, Dechow et al. (2010) highlight that SEC is more likely to target egregious misstatements and avoid ambiguous cases of aggressive but within-GAAP earnings management. Thus, the ex-post indicators could potentially suffer from high Type II error rate and the predictive power of the resulting models can neither be generalised to other markets nor to the earnings management firms which are not enforced by SEC.

One way to go around the problems of high Type II error rate of the ex-post indicators of earnings management is to evaluate the likelihood of earnings management based solely on the expected distribution of figures reported in financial statements. A recent and interesting strand of the literature argues that financial statement numbers, like many other natural sets of numbers, should theoretically be distributed according to Benford’s law. This law posits that the probability the first digit of a figure reported on financial statements is one is biggest, and such probability decreases as the first digit gets larger. Hence, if a set of financial statements report figures which are distributed too differently from what is expected under Benford’s law, it is a sign that those figures have been manipulated. Amiram et al. (2015) develop a measure, called FSD_SCORE, which captures the deviation of the distribution of financial statement figures of a firm in a given year from the theoretical distribution posited by Benford’s law and provide evidence that FSD_SCORE is effective in flagging up financial statements which contain errors. Nguyen et al. (2021) confirm the effectiveness of using Benford’s law in predicting earnings management in the UK context. The advantage of this approach is that FSD_SCORE does not require time-series or cross-sectional data to be estimated, nor does it rely on an external indicator of earnings management. Amiram et al. (2015, p. 1558) suggest that FSD_SCORE could be a ‘useful tool to augment existing techniques to access accounting data quality’.

Sloan (1996) initiates another strand of accounting literature by showing that accruals are negatively related to future returns. Xie (2001) goes even further showing that it is the discretionary accrual component which mainly drives Sloan’s results. The evidence seems to suggest that the market misprices the information contained in accruals, and especially the component over which managers could exercise their discretion to manipulate. To date, the evidence of market mispricing total and discretionary accruals remains one of the most persistent ‘market anomaly’ with many studies confirming its existence (e.g. Desai et al. 2004; Cheng and Thomas 2006; Mashruwala et al. 2006; Soares and Stark 2009; Choy et al. 2021). However, despite the established evidence of the existence of the accruals anomaly, the reason behind it is still controversial. Sloan (1996) attributes the accruals anomaly to the market’s systematic overestimation of the persistence of accruals and underestimation of the persistence of cash flows. That the market irrationally misprices accruals is supported by, for example, Hirshleifer et al. (2012), Allen et al. (2013) and Papanastasopoulos (2020). On the other hand, many other authors suggest the returns to trading strategies designed to exploit the market mispricing of accruals are a fair compensation for risk (Khan 2008) or rational under the Q-theory of investment (Wu et al. 2010).

As accruals, and especially its discretionary component, is widely used as a measure of earnings management, the literature tends to suggest the market does not see through, hence underreacts to, earnings management. Nevertheless, one issue remains unexplored. While a manager can assumingly hide the ‘true’ earnings through earnings management, he or she cannot hide the surrounding context (for example, manager’s motivations, pressures or opportunities for managing earnings, etc.). Hence investors should be able to ‘sense’ the existence of earnings management by observing the surrounding context. If investors are still ‘fooled’ by earnings management, it implies that they must have mispriced both the magnitude and the context in which such manipulation occurs. In other words, if we could document that the market also misprices the context of earnings management incrementally beyond the mispricing of discretionary accruals, it would offer more convincing evidence that the market could not see through earnings management, which is crucial to understand how the market processes publicly available financial statement information. Following this intuition, we form the hypothesis as follows:

Ceteris paribus, firms with a more (less) susceptible context of earnings management yield lower (higher) abnormal returns.

To empirically test this hypothesis, this paper develops a model that captures the context of earnings management and investigates if such model can predict future stock returns.

3 Sample selection

The sample comprises all UK listed stocks during the period from 1995 to 2012.Footnote 1 The paper builds upon a rich literature on earnings management and the market mispricing of accruals, of which many existing studies focus on the US market. Nevertheless, the choice of the UK market as the setting for this study is justifiable for a variety of reasons which make the paper an interesting and important contribution to the advancement of our knowledge on these topics. First, the UK market offers a unique setting shaped by several characteristics of the environment in which listed companies operate, ranging from financial reporting and corporate governance regulations, cultural factors and the norms in business and reporting practice, to the popular bases of share ownership in listed companies. For example, the UK business norms and financial reporting practices, especially when it comes to selecting the mechanism for earnings management, are quite different from other developed markets such as the US (see, for example, Bond 2000; Athanasakou et al. 2009, 2011). In addition, institutional holdings tend to be more prevalent in the UK compared to other markets. Institutional stockholders, especially financial institutions and professionally-managed funds, typically play a more active monitoring role, which in turns constraints managers’ discretion over financial reporting practices (Chung et al. 2002). These characteristics suggest that using the UK market as a setting for research on earnings management detection would yield interesting and unique insights.

Second, the UK is one of the world’s major economies with one of the largest stock markets. Hence, knowledge on how the UK market operates cannot be understated. Compared to the US, the UK-based literature on earnings management is remarkably thinner and that creates an important gap for further studies to fill in. This paper does not simply replicate a US study in the UK, rather it provides evidence which is directly relevant in the UK context with a number of implications for other developed markets such as the US. Having said that, while we would leave a full replication of the study (to the US or any other markets) for future research, we offer a reassurance for subsequent studies to build upon what we find by showing that most of the main conclusions from this paper could be generalised to, at least, the US market.

Finally, using the UK market allows us to make use of a unique dataset which could significantly add strength to our analysis as well as reinforce previous findings using US data. One of the recent strands in the earnings management literature is to employ an ex-post indicator of earnings management, most popular in the US thanks to rich and readily available data on earnings restatements, such as the SEC’s accounting and auditing enforcement releases or US Government Accountability Office’s (GAO) releases of restatements. These ex-post measures of earnings management have significant advantages as well as drawbacks (see, for example, Dechow et al. 2010). One most notable pitfall of the ex-post measures is the sample selection biases. In particular, the SEC or GAO does not randomly select firms to investigate. Due to constrained resources, they follow specific strategies to target firms for investigation, such as to prioritise large companies, unambiguous cases, or serious frauds. As far as the accounting profession is concerned, such pitfalls could not be completely corrected. However, it could be mitigated by having more datasets where the investigated firms are selected by other authorities applying different sampling strategies. One of the analyses in this paper employs the sample of firms subjected to investigation by the UK Financial Reporting Review Panel (FRRP), which has a rather different sampling strategy compared to the SEC and GAO in the US (although the objectives might be similar). Hence, our evidence linking the FRRP-investigated firms with earnings management complements and further mitigates the concerns of using ex-post measures of earnings management. Moreover, our use of the FRRP data also makes the paper especially distinctive compared to previous UK-based studies.

To avoid survivorship bias, we include both live and dead stocks. We exclude both financial and utilities firms due to their distinct financial reporting requirements. Datastream is the main source of financial data, except for external auditor and merger and acquisition deals for which we use Bloomberg. Data from Datastream and Bloomberg are combined using the International Securities Identification Number (ISIN). Therefore, we also exclude firms which do not have an ISIN. For firms which have more than one type of common stocks, only one is included in the sample. To ensure comparability, we restrict our sample to only those firms which report financial results in British Pound Sterling and whose financial years have between 350 and 380 days. Our sample also excludes firms with market value of less than £1 million to avoid very small firms which are typically thinly traded in practice but can influence the returns on the equally-weighted portfolios. We also exclude stocks with negative market-to-book ratios. Finally, we require data to be available to calculate the variables as described in the Appendix (except for corporate governance and compensation variables) to arrive at the final main sample. The sampleFootnote 2 consists of 11,920 firm-year observations from 1866 unique firms across 43 Datastream level-six industries. We winsorise all continuous variables at the 1st and 99th percentiles to mitigate the influence of extreme values.

4 Research methodologies

4.1 The construction of ESCORE

One of the key contributions of our study is the empirical measure of the context of earnings management. Watts and Zimmerman (1986) provides the theoretical foundation for the establishment of a context in which firms might act in certain ways when it comes to exercise discretion over the choice of accounting practices. In particular, earnings management in this framework is likely to exist in the presence of one or a combination of the following triggers: a bonus plan (the bonus plan hypothesis), high cost of renegotiation of debt (the debt/equity hypothesis) and high political cost (the political cost hypothesis). Subsequent studies in this large, and still growing, literature add significantly to the list of earnings management signals, which have outgrown the three broad categories suggested by Watts and Zimmerman (1986). The subsequent literature generally categorizes earnings management signals into the managerial incentives and pressures, with the added dimensions which cover the scope for earnings management to happen, including the practical constraints and other firm’s innate factors (see, for example, Healey and Wahlen 1999, Beneish 2001, Dechow et al. 2011).

Within the scope of this study, we define the ‘context of earnings management’Footnote 3 as (a) the incentives to manage earnings, (b) the pressures under which managers are more likely to resort to earnings management, (c) the constraints on earnings management, and (d) the innate factors of the firm which could indicate the existence of earnings management. To capture the context of earnings management, we construct a composite index named ESCORE which is the sum of 15 individual binary variables, each taking a value of one if a firm has a suspicious signal and zero otherwise. We select these signals based on the rich extant earnings management literature and discuss their construction in the sub-sections below. In selecting the individual signals, we focus on those signals that suggest higher likelihood of earnings management without preference to either aggressive or conservative practices. As a result, ESCORE is not a signed measure of earnings management. We conjecture that the relation between ESCORE and future returns comes from the power of the ESCORE to reveal the context in which earnings management is likely, and not from its ability to reveal the sign and magnitude of such manipulation.

4.1.1 Benchmark construction procedure

To construct ESCORE, we first need a ‘benchmark’ for each individual signal. For example, we know from prior literature that small firms are more susceptible to earnings management (Lang and Lundholm 1993; Dechow and Dichev 2002). However, we need a ‘benchmark’ to determine which firms should be classified as ‘small’. Such ‘benchmarks’ should reflect the characteristics of the corresponding industry. We explain the procedures to construct these benchmarks below.

First, for each industry-year,Footnote 4 we rank firms based on \(\gamma\) (where \(\gamma\) is substituted by the relevant financial signals used in this study). Next, we use the 20th and 80th percentiles in each industry-year as the lower and upper benchmarks, denoted as \(\gamma_{k,t}^{20}\) and \(\gamma_{k,t}^{80}\) respectively, where k = 1… 43 represents unique Datastream level-six industries in the sample, and t = 1995… 2011 represents 17 sample years.Footnote 5 If a signal is lower (higher) than \(\gamma_{k,t}^{20}\) (\(\gamma_{k,t}^{80}\)), we consider it too low (high). We apply this procedure to all individual signals that require an industry-specific benchmark to construct. Next, we construct ESCORE using individual signals under four broad categories.

4.1.2 Incentives to manage earnings

The first category covers the incentives for managing earnings, including equity issue, debt issue, share-for-share merger and acquisition, and stock overvaluation. Prior evidence suggests that firms inflate earnings prior to equity issues (Teoh et al. 1998b; Cohen and Zarowin 2010; DuCharme et al. 2004; Siew Hong and Wong 2002; Rangan 1998; Shivakumar 2000; Iqbal et al. 2009; Iqbal and Strong 2010, Kothari et al. 2016). This study defines the indicator of equity issue, denoted as ESEO, as a dummy that takes a value of one if (i) a firm’s outstanding shares at the end of the current fiscal year (year t) increase by at least 5% compared to last year’s (year t-1) and (ii) there are positive proceeds from issuing common/preferred stocks in the current year, zero otherwise.

Managers may also like to ‘decorate’ financial statements prior to a major debt issue to negotiate the cost of debt down. Athanasakou and Olsson (2012) find a positive relation between an indicator of debt issue and earnings management. To capture debt-issue-related incentives to inflate earnings, we define the indicator of debt issue, EDDEBT, as a dummy that takes a value of one if DDEBT is 5% or higher, where DDEBT is calculated as the percentage change of total of short- and long-term debtFootnote 6 at the end of the current year (year t) compared to last year’s (year t-1) total debt, zero otherwise. The 5% benchmark is employed to ensure that the issue is large enough for managers to consider managing earnings.

Firms also have strong incentives to inflate earnings prior to share-for-share mergers and acquisitions (M&A) in an attempt to temporarily push stock price up to minimise the number of shares paid (Erickson and Wang 1999; Botsari and Meeks 2008, 2018; Louis 2004). We define the indicator of share-financed M&A, denoted as EMA, as a dummy that takes a value of one if a firm announces an M&A deal within the current financial year for which shares are proposed as (part of) the payment method, zero otherwise.

Recent literature also considers the effect of stock market overvaluation on earnings management. Jensen (2005) conjectures that overvaluation creates a pressure on firms to inflate earnings to maintain their high market valuation. Empirical evidence also supports the premise that overvaluation induces income-increasing earnings management (Chi and Gupta 2009; Houmes and Skantz 2010; Badertscher 2011; Duong and Pescetto 2019). To capture this signal, we define the indicator of overvaluation, denoted as EOV, as a dummy that takes a value of one if a firm’s beginning of year t market-to-book ratio (MTB), calculated as market value to book value of equity, is higher than the corresponding \(MTB_{k,t}^{80}\), zero otherwise.

4.1.3 Pressures to manage earnings

We use various proxies in the model to capture the pressures to manage earnings, including meeting or just beating earnings benchmarks, financial distress, debt level, firm size, and business life cycle stage. Burgstahler and Dichev (1997) document a discontinuity of earnings around two important benchmarks, namely zero earnings and last year’s earnings. A similar pattern has also been documented in the UK (Gore et al. 2007; Al-Shattarat et al. 2018). The indicator employed to capture the pressure to meet or beat zero earnings benchmark, denoted as EROA, is defined as a dummy that takes a value of one if a firm’s returns-on-assets ratio (ROA), calculated as earnings before extraordinary items in year t scaled by beginning (year t-1) total assets, is equal to or larger than zero but smaller than 0.01, zero otherwise. EDROA, the indicator employed to capture the pressure to avoid reporting earnings decreases, is defined as a dummy that takes a value of one if a firm’s DROA, calculated as the change in earnings before extraordinary items in year t compared to that in year t-1 scaled by beginning total assets, is between zero and 0.005, zero otherwise.

Prior research shows that firms would engage in earnings management if the unmanaged earnings fall short of the expected dividends by a small amount (Daniel et al. 2008; Atieh and Hussain 2012). EDIV captures this pressure. It is a dummy that takes a value of one if a firm’s dividend deficit, denoted as DIVDEF and calculated as the difference between net income and total cash dividends in year t scaled by beginning total assets, is between zero and 0.01, zero otherwise.

Financially distressed firms are understandably under pressure to inflate earnings. Garcia Lara et al. (2009) show that such firms manage earnings upwards. Beneish (1997) reports that financial distress is a factor that leads to GAAP violation. To capture the presence of these pressures, we estimate the UK-based ZSCORE (Taffler 1983). Taffler (1983) and Agarwal and Taffler (2007) show that UK firms with negative ZSCORE are more likely to become bankrupt. Following this evidence, we define EDISTRESS (the indicator of financial distress) as a dummy that takes a value of one if a firm’s ZSCORE in year t is negative, zero otherwise.

The use of debt also has implications for earnings management. Watts and Zimmerman (1986) suggest that debt contracts have a vital influence on a firms’ accounting policy. On one hand, higher debts induce pressures on firms to inflate earnings. Indeed, debts usually come with some covenants which firms need to comply with. Violating debt covenants leads to firms being penalised by lenders by means of higher cost of debt (Dichev and Skinner 2002; Dyreng et al. 2020). Therefore, firms with more debt have a greater pressure to manage earnings to avoid violation of debt covenants. DeFond and Jiambalvo (1994) find that abnormal accruals are significantly higher in the years preceding debt covenant violations. Ghosh and Moon (2010) find that firms with high debt would have strong incentive to manage earnings. On the other hand, however, the literature also suggests that firms with low level of debt are also likely to engage in earnings management (Astami and Tower 2006). In addition, the evidence that financial leverage is positively related to accounting conservatism (for example, Watts 2003a, b; Pae 2007) implies that firms with little debt are less bound contractually and their reported earnings are less subject to scrutiny from lenders, hence there could be more scope for earnings management. In brief, the literature suggests that firms which have either too high or too low debts are suspicious of earnings management. The ZSCORE, as explained earlier, has already captured firms with high debts. The indicator of firms with too little debts, denoted as EDEBT, is defined as a dummy that takes a value of one if a firms’ beginning of year t DEBT, measured as the total of short- and long-term debt scaled by year t total assets, is lower than the corresponding \(DEBT_{k,t}^{20}\), zero otherwise. EDEBT captures the context in which firms are subject to less scrutiny from lenders, hence have more room for managing earnings, in both directions.

It is also more difficult for large firms to manage earnings due to their high public visibility (Lang and Lundholm 1993; Dechow and Dichev 2002; Zhang et al. 2019). Smaller firms, on the contrary, usually face less public attention and they struggle to perform under various financial constraints. Hence small firms are often more likely to engage in earnings management, especially if the managers believe the struggles are just transitory. Indeed, various studies in the earnings management literature use firm size as a control variable and the evidence shows that firm size is related to discretionary accruals. ESIZE, the indicator of small firms, is a dummy that takes a value of one if a firm’s beginning of year t market value of equity (MVE hereafter) is lower than the corresponding \(MVE_{k,t}^{20}\), zero otherwise. ESIZE captures the context in which firms are subject to less scrutiny from the public, hence could have more room to manage earnings, in both directions.

The last variable in this group, ECYCLE, captures firms which are in the introduction and growth stage in their business life cycle. Young listed firms, most of which use funds from the capital markets for the first time, are usually under pressure to perform and grow. Accounting manipulation could be a way for these young listed firms to respond to such pressures (Beneish 1997; Dopuch et al. 1987). Growth firms usually face strong investment opportunities and are expected to deliver strong growth and financial performance. Fama and French (1995) show that growth firms typically report higher earnings. Lakonishok et al. (1994) suggest that the market generally places too much expectation on growth stocks which results in market overreaction. Under such pressure, firms might have to resort to earnings management should their underlying economic performance fall short of the expectation to avoid market penalty. Such prediction has been substantiated by empirical evidence (Skinner and Sloan 2002). Following Dickinson (2011), ECYCLE is defined as a dummy that takes a value of one if a firm’s operating cash flows are negative, financing cash flows are positive and investing cash flows are negative (introduction stage), or its operating and financing cash flows are positive while its investing cash flows are negative (growth stage), and zero otherwise. All cash flows are measured at year t-1.

4.1.4 Constraints on earnings management

We include external auditor and balance sheet bloat in the model to represent constraints on earnings management. Prior literature shows that external audit quality plays a major role in constraining accruals management (Becker et al. 1998; Francis et al. 1999; Alzoubi 2016). Krishnan (2003) finds that firms whose external auditors have more industry experience, on average, have less discretionary accruals. Following this evidence, several studies use an indicator of firms being audited by the Big-5 auditors as a control variable in regression where the dependent variable is discretionary accruals and in general these studies report a significant negative relationship (Zang 2012; Athanasakou and Olsson 2012; Choi et al. 2018). Nevertheless, the existing evidence about the constraining role of external auditors is mixed regarding the sign of the manipulation. For example, Becker et al. (1998) predict that the presence of the Big-5 auditors is negatively related to the signed discretionary accruals, while Francis et al. (1999) only present evidence about the relationship between Big-5 external auditors and the absolute value of discretionary accruals. Overall, the absence of Big-5 auditors could give room for firms to manage earnings more easily, in both directions. We, therefore, use the absence of a Big-5 external auditor as a signal of earnings management, but do not predict the sign of the manipulation. EAUDIT, the indicator of the absence of Big-5 auditor, is defined as a dummy that takes a value of one if a firm is not audited by the Big-5 accountancy firms in year t. The Big-5 is defined as the following firms and their affiliates: Arthur Andersen, Deloitte Touche Tohmatsu, Ernst & Young, KPMG, PriceWaterhouseCoopers. Audit firms which are later merged with one of the Big-5 are also considered as part of the Big 5 (e.g. Coopers and Lybrand is deemed as PriceWaterhouseCoopers). If data on the auditor is missing from Bloomberg for a firm in a year, it is assumed that the firm is not audited by a Big-5 auditor.

Due to the self-reversal nature of accruals, we consider past use of accruals management to act as a constraint on further engagement (Barton and Simko 2002). Net operating asset (NOA) can proxy for the ‘balance sheet bloat’, which captures the constraint induced by past engagement in accruals management (Houmes and Skantz 2010, Beuselinckc et al. 2019). Firms with high NOA have engaged extensively in income-increasing accruals management in the past, which in turn constrains the firm’s ability to further manage accruals. Following this, we calculate NOA as the sum of net book value of equity and total debt minus cash and cash equivalents, all scaled by total assets. The indicator of low balance sheet bloat, denoted as EBLOAT, is a dummy that takes a value of one if a firm’s beginning of year t NOA is lower than the corresponding \(NOA_{k,t}^{20}\), zero otherwise.

4.1.5 Innate characteristics

Earnings management is engaged not only because of managerial motives, but also due to some firm’s innate factors (Dechow and Dichev 2002; Francis et al. 2005, 2004; Athanasakou and Olsson 2012). Dechow and Dichev (2002) suggest some important innate factors which could imply earnings management, for example, the variability in some fundamentals such as sales or cash flows, firm size, operating cycle and incident of losses. Several of the innate factors identified in the extant literature as signals of earnings management, such as firm size, operating cycle and incident of losses, have been covered earlier. We do not include some of the innate factors, such as the variability of sales and cash flows, which require long history of data to calculate. Requiring long history of data would eliminate young firms from the sample, a practice that may introduce bias in the main analysis since some earnings management signals (e.g. ECYCLE) are designed to capture young firms. We also do not consider the intensity of intangible assets due to insufficient data to establish plausible industry benchmarks. However, to capture the intensity of tangible assets, we estimate CAP as the ratio of property, plant, and equipment divided by total assets. Prior literature shows that smaller CAP is associated with poor earnings quality, hence such firms are suspicious of earnings management (Athanasakou and Olsson 2012; Francis et al. 2004). ECAP, the indicator of low intensity of tangible assets, is a dummy that takes a value of one if a firm’s beginning of year t CAP is smaller than the corresponding \(CAP_{k,t}^{20}\), zero otherwise. For firms which have the signal ECAP of one, we do not predict the sign of the manipulation.

Lastly, some studies document the effect of book-tax conformity on earnings management (Hanlon and Heitzman 2010; Athanasakou and Olsson 2012; Sundvik 2017). If one agrees that taxable profits are difficult and costly to manipulate, then the more accounting earnings diverge from taxable profits, the more likely it is that such accounting earnings have been manipulated. Generally, the evidence supports such intuition (Desai 2005). Following the literature, we calculate the book-tax difference, denoted as BOOKTAX, as the absolute value of the difference between year t’s reported pre-tax income and an estimate of total taxable profits, denoted by TTP, all scaled by sales in year t. We estimate TTP using the lower and upper limit for marginal tax relief (denoted LL and UL, respectively), small profit tax rate (SR) and main tax rate (MR) applicable at the time in conjunction with the reported income tax expenses (TXT). We source LL, UL, SR and MR in each sample year from HM Revenue & Customs (2013). With only published information, it is almost impossible to estimate TTP. Therefore, we make some assumptions to simplify the estimation. First, we assume that the reported tax expenses represent solely the amount of income tax levied in the considered period (i.e. no extraordinary penalty or retrospective payment or anything else of that nature). Second, for the profits that fall between the LL and UL, we assume that the tax rate is the average of SR and MR, denoted as AR, to avoid complex calculation. With these assumptions, TTP is worked back from the tax expenses as follows:

$$\begin{gathered} If \, TXT \, \le \, 0, \, then \, TTP \, = \, 0 \hfill \\ If \, 0 \, < \, TXT \, \le \, LL{\times}SR, \, then\;TTP = \frac{TXT}{{SR}} \hfill \\ If \, LL{\times}SR \, \le \, TXT \, \le \, \left( {UL \, {-} \, LL} \right){\times}AR, \, then\;TTP = \frac{{TXT - \left( {LL \times SR} \right)}}{AR} + LL \hfill \\ If \, TXT \, \ge \, \left( {UL \, {-} \, LL} \right){\times}AR, \, then\;TTP = \frac{{TXT - \left( {LL \times SR} \right) - \left[ {\left( {UL - LL} \right) \times AR} \right]}}{MR} + UL \hfill \\ \end{gathered}$$
(1)

We define EBT as a dummy that takes a value of one if a firm’s BOOKTAX is higher than the corresponding \(BOOKTAX_{k,t}^{20}\), zero otherwise. EBT, therefore, captures firms which have reported accounting earnings that are different from taxable profits, an indication that accounting earnings may have been managed, in both directions.

4.1.6 The ESCORE

We define the composite ESCORE as the sum of all 15 individual binary signals as follows:

$$\begin{aligned} ESCORE \, = \, & ESEO \, + \, EDDEBT \, + \, EMA \, + \, EOV \, + \, EROA \, + \, EDROA \, \\ & + \, EDIV \, + \, EDISTRESS \, + \, EDEBT \, + \, ESIZE \, + \, ECYCLE \, \\ & + \, EAUDIT \, + \, EBLOAT \, + \, ECAP \, + \, EBT \\ \end{aligned}$$
(2)

As designed, ESCORE is an integer which can theoretically range from 0 to 15. The smaller (larger) the ESCORE, the less (more) suspicious the context of earnings management surrounding a firm is. Being aggregated from 15 individual signals, an immediate question is whether those signals are correlated and thus could be reduced to a more parsimonious model through, for example, principal component analysis. To respond to this possibility, we calculate and look at the Eigen values from principal components analysis. The results are reported in Table 1. The first principal component, which has the largest variance of any linear combination of the individual scores, could explain only 12.83% of the total variance. Subsequent principal components contribute even less than this and range from 9.37% to 3.96%. Looking at the Eigen vectors, we could not find too high loading on any particular variables, which suggests that none of the individual scores plays a dominant role in the variance of the composite ESCORE. Overall, it seems unlikely that variable reduction through principal component analysis would significantly affect the ESCORE compared to the simple sum-of-binary-variable approach.

Table 1 Eigen values of the correlation matrix from principal components analysis

Table 2 shows the distribution of firm-year observations across ESCORE portfolios. Although ESCORE could theoretically range from 0 to 15, no firm in our sample accumulates more than 9 signals. In subsequent analyses, we pay particular attention to the portfolios of low and high ESCORE stocks. For this purpose, we arbitrarily group stocks with ESCORE of zero into the low-ESCORE, those with ESCORE of six and above into the High-ESCORE and the rest to the medium-ESCORE group. Since there are fewer number of stocks with larger ESCORE, we group all stocks with ESCORE of 6 and above into the High ESCORE group (865 observations). The purpose is to ensure that the High ESCORE portfolio has comparable number of observations to the Low ESCORE counterpart (which comprises 862 stocks with ESCORE of zero). Intuitively, our grouping scheme is equivalent to considering that the context surrounding a stock which has accumulated six or more signals is highly susceptible of earnings management.Footnote 7

Table 2 Distribution of firms across ESCORE groups

4.2 Measures of accruals and real earnings management

To test how the context of earnings management is effective in predicting future stock returns, we first need to test if the ESCORE could indeed capture such context. We illustrate the effectiveness of ESCORE to detect the context of earnings management by looking at how other traditional measures of earnings management (e.g. discretionary accruals and real earnings management proxies) vary as the context (captured by ESCORE) changes. For this, we consider six proxies of earnings management.

To begin, we employ the cross-sectional version of the modified-Jones model (Jones 1991; Dechow et al. 1995) to estimate discretionary accruals (DAC).Footnote 8 In this model, total accruals are calculated as the difference between income before extraordinary items and net operating cash flows. The calculation of total accruals follows the cash flows approach to avoid the potential measurement errors identified by Hribar and Collins (2002).Footnote 9 To obtain DAC, we run regressions in each (Datastream level-six) industry-year with at least fifteen observations.Footnote 10

Although there are other competing models to estimate discretionary accruals (Dechow et al. 1995; Guay et al. 1996; Bernard and Skinner 1996; Young 1999; Thomas and Zhang 2000; Peasnell et al. 2000; Fields et al. 2001), the existing literature generally suggests that there is no other model that clearly outperforms the modified-Jones model (Peasnell et al. 2000; Botsari and Meeks 2008). Nevertheless, many UK studies focus only on working capital accruals arguing that depreciation is not a suitable means to manage earnings since it is highly visible and if earnings are managed through depreciation, the effects could be unwound quite easily by financial statement users (Young 1999; Peasnell et al. 2000; Gore et al. 2007). To account for this argument, we also estimate discretionary working capital accruals (DWAC) using the ‘margin model’ as described in Peasnell et al. (2000), which has been shown to work well in the UK context.

We next consider real earnings management by following Roychowdhury (2006) to estimate three measures of real earnings management, namely the abnormal cash flow (DCF), abnormal production cost (DPROD) and abnormal discretionary expense (DDISEXP). Regressions are estimated for each (Datastream level-six) industry-year with at least fifteen observations. DPROD is exactly as described in Roychowdhury (2006). Nevertheless, to make their sign consistent with other measures of earnings management used in this paper, we multiply Roychowdhury’s (2006) measures of abnormal cash flow (DCF) and abnormal discretionary expenses (DDISEXP) by -1. As a result, a positive value of DCF and DDISEXP would imply income-increasing earnings management and vice versa.

DCF, DPROD and DDISEXP capture three dimensions of real earnings management, namely the manipulation of sales activities, production activities, and discretionary expenses. These three ways of managing earnings could be used as substitutes, i.e. a manager would manipulate earnings through changing real operation decisions in one or two areas out of the three, and not necessarily all of them at the same time. As a result, for example, when the context suggests a firm is inflating earnings and the firm decides to do it through sales manipulation, DPROD and DDISEXP are not necessarily high. It is, hence, important to look at the overall real earnings management strategy rather than just the individual ones. To facilitate this, we also construct a composite measure that pools together the three measures of real earnings management as follows:

$$TOTALRM_{i,t} = \left[ {\frac{{DCF_{i,t} - \overline{{DCF_{t,k} }} }}{{\sigma \left( {DCF} \right)_{t,k} }} + \frac{{DPROD_{i,t} - \overline{{DPROD_{t,k} }} }}{{\sigma \left( {DPROD} \right)_{t,k} }} + \frac{{DDISEXP_{i,t} - \overline{{DDISEXP_{t,k} }} }}{{\sigma \left( {DISEXP} \right)_{t,k} }}} \right]/3\left( {i \in k} \right)$$
(3)

where \(TOTALRM_{i,t}\) is the composite measure of real earnings management of firm i in year t; \(\overline{{DCF_{t,k} }}\), \(\overline{{DPROD_{t,k} }}\), \(\overline{{DDISEXP_{t,k} }}\) [\(\sigma \left( {DCF} \right)_{t,k}\), \(\sigma \left( {DPROD} \right)_{t,k}\), \(\sigma \left( {DISEXP} \right)_{t,k}\)] are, respectively, the mean [standard deviation] of DCF, DPROD, DDISEXP of all firms in industry k in year t; and k = 1…43 are 43 unique Datastream level-six industries.

The above procedure converts DCF, DPROD and DDISEXP into standardised variables before averaging them. \(TOTALRM_{i,t}\), therefore, captures the combined effects of Roychowdhury’s (2006) real earnings management strategies. Compared to other studies which simply add DCF, DPROD and DDISEXP together (e.g. Cohen and Zarowin 2010), our approach is advantageous because our standardization process could mitigate the concerns regarding adding variables with different distributions.

The six measures of earnings management as described above are then employed to test the efficacy of ESCORE in capturing the context of earnings management. ESCORE is primarily designed to capture the context in which earnings management is more likely to occur, not the sign of such manipulation. Some components of ESCORE, including ESEO, EDDEBT, EMA, EOV, EBLOAT, EROA, EDROA, EDIV, EDISTRESS, ECYCLE, predict inflationary (i.e. aggressive) earnings management, while others, including EAUDIT, EBT, ECAP, EDEBT, ESIZE, only suggest the presence of earnings management behaviour regardless of the sign. We, therefore, test the effectiveness of ESCORE in two ways. First, we examine if ESCORE is able to indicate the presence of earnings management, in both directions, by looking at how the absolute values of DAC, DWAC, DCF, DPROD, DDISEXP and TOTALRM (denoted by ADAC, ADWAC, ADCF, ADPROD, ADDISEXP and ATOTALRM, respectively) vary across ESCORE groups. Second, as most of the components of ESCORE suggest an inflation of earnings, we also expect that ESCORE could identify the context in which the most aggressive earnings management occurs. For investors, aggressive earnings management is arguably more harmful, hence it is important to see if ESCORE can indicate those circumstances. For this purpose, we examine the association of ESCORE with the indicators of aggressive earnings management, denoted by HDAC, HDWAC, HDCF, HDPROD, HDDISEXP and HTOTALRM. We define these as the dummy variables that take a value of one if the stock is in the top quintile ranked in each industry-year by DAC, DWAC, DCF, DPROD, DDISEXP and TOTALRM, respectively.

4.3 Calculation of returns

We calculate ESCORE for calendar year t (t = 1995,…, 2011) for all stocks with fiscal year ending in any month of the year. For each year, we then sort firms by their ESCORE. Based on the ESCORE of year t, we form portfolios at the beginning of June of year t + 1 and hold them until the end of May of year t + 2. For each month, we estimate buy-and-hold raw returns for each stock as the percentage change in Datastream’s Return Index, assuming dividend reinvestment, and is denoted by \(BHRR_{i,j}^{m}\). If a stock delists during the holding period, we treat the delisting returns as follows. If a stock does not have a monthly return for June (the first month after portfolio formation), we exclude the firm-year observation from the sample (equivalent to assuming that investors cannot consider the stock for trading due to non-existence). If a stock has a return for June, but then delists before the end of the holding period due to non-performance-related reasons, we assume that the investors earn the returns from portfolio forming date to delisting date, and then reinvest the proceeds in the size-matched portfolio which assumingly bears similar risk compared to the delisted firm. Prior studies (for example, Soares and Stark, 2009; Desai et al., 2004) use this approach to reflect the reality that the returns in most M&A-related delisting cases are positive. We estimate returns on the size-matched portfolio using similar procedure to calculate size-adjusted returns described below. If the delisting is performance-related, we assume that the whole initial investment is lost, hence a delisting return of –100% is used.

To test the effectiveness of ESCORE-based trading strategies, the study uses various measures of buy-and-hold abnormal returns. First of all, we estimate firm-specific monthly buy-and-hold size-adjusted returns as follows. Each year, we sort all stocks with available data from Datastream into deciles based on market capitalization at the end of the previous fiscal year. We then estimate returns for each size decile portfolio d (d = 1… 10), \(SDR_{d,j}^{m}\), as the average \(BHRR_{i,j}^{m}\) of all stocks which belong to decile d. For each stock, its corresponding size decile and size decile returns are identified. Finally, we estimate the buy-and-hold size-adjusted return for stock i in month j, denoted by \(BHSAR_{i,j}^{m}\), as the difference between the raw return and the return on the corresponding size decile portfolio.

From the above firm-specific returns, the raw and size-adjusted returns of portfolio p, denoted by \(BHRR_{p,j}^{m}\) and \(BHSAR_{p,j}^{m}\), are respectively the equally-weighted \(BHRR_{i,j}^{m}\) and \(BHSAR_{i,j}^{m}\) of all stocks in portfolio p. Following Desai et al. (2004), to avoid the potential inflation of t-statistics when assessing the abnormal portfolio returns over time, we calculate \(BHSAR_{p,j}^{m}\) for each month and treat it as one observation. The t-statistics used to test if \(BHSAR_{p}^{m}\) and \(BHMAR_{p}^{m}\) are significantly different from zero are calculated from 204 time-series monthly observations (across 17 sample years).

We calculate \(BHSAR_{p,j}^{m}\) using reference portfolios, an approach which could bias the test statistics (Barber and Lyon 1997; Kothari and Warner 1997). In addition, size-adjusted returns are not capable of capturing some other known dimensions of risk, such as market-to-book and momentum factors. To mitigate these concerns, we also estimate the Fama–French model augmented by the momentum factor (Carhart 1997) as follows:

$$BHRR_{p,j}^{m} - Rf_{j} = \alpha + \beta_{1} \left( {Rm_{j} - Rf_{j} } \right) + \beta_{2} SMB_{j} + \beta_{3} HML_{j} + \beta_{4} UMD_{j} + \varepsilon$$
(4)

where \(BHRR_{p,j}^{m}\) is equally-weighted raw return of portfolio p for month j; \(Rf_{j}\), \(Rm_{j}\), \(SMB_{j}\), \(HML_{j}\), \(UMD_{j}\) are, respectively, the monthly risk-free rate, returns on the market portfolio, size, market-to-book and momentum factors, all as described and downloaded from the publicly available database of Gregory et al. (2013). We then calculate the monthly buy-and-hold portfolio abnormal returns using the estimated coefficients obtained from regression (4), denoted by \(BHAR4F_{p,j}^{m}\). Similar to the t-test employed for size-adjusted returns, the t-statistic used to test if \(BHAR4F_{p,j}^{m}\) is significantly different from zero is calculated from 204 time-series monthly observations.

Carhart’s (1997) approach to estimate abnormal returns is also not flawless, especially in the UK context (Lee et al., 2007; Bauer et al., 2010). Nevertheless, since we use both the reference and regression-based approaches, it would reasonably guard the results against any possible significant biases due to the way abnormal returns are calculated.Footnote 11

The monthly returns as calculated above are used in portfolio tests. For multivariate regressions, we compound monthly returns into annual buy-and-hold returns, denoted by an ‘a’ superscript in place of the ‘m’ after each measure of returns, to match with the annual update of the explanatory variables.Footnote 12

5 Results and discussions

5.1 Descriptive statistics and correlations

Table 3 presents some descriptive statistics of the main variables used in this study. Mean market value of equity, MVE, (£390 million) is larger than the median (£44 million) which suggests the existence of some very large observations. Those large firms could significantly influence the returns of value-weighted portfolios. The paper, therefore, reports the results from applying the equally-weighted scheme in the main portfolio tests.Footnote 13 The mean of ROA is –0.0072 while the median is 0.0451, which shows the existence of some very large negative values. This could be a sign of the presence of firms which ‘take a bath’ since such practice typically involves booking very large losses.

Table 3 Descriptive statistics (n = 11,920)

Table 4 presents correlations between the main variables. The correlations between individual signals are quite low (ranging from only 38.1% between EDISTRESS and EBT to ‒20.5% between EDEBT and EDDEBT) and insignificant in many cases. It suggests that the individual signals capture different and uncorrelated dimensions of the context of earnings management which further reinforces the construction of ESCORE as the sum of all factors. ESCORE also shows significant negative correlation with all measures of returns. This initial evidence suggests that ESCORE could predict stock returns.

Table 4 Correlations

5.2 ESCORE and the context of earnings management

5.2.1 Univariate analysis

To test the effectiveness of ESCORE in capturing the context of earnings management, we employ three tests. In the first test, we examine how the six measures of earnings management (as explained in Sect. 4.2) vary as the context of earnings management (i.e. the ESCORE) changes. Table 5 presents the mean of ADAC, ADWAC, ADCF, ADPROD, ADDISEXP, ATOTALRM (the absolute values) and HDAC, HDWAC, HDCF, HDPROD, HDDISEXP, HTOTALRM (the indicators of aggressive earnings management) across ESCORE groups, together with the t-test comparing the means of the High ESCORE group (ESCORE of six and above) with those of the Low ESCORE group (ESCORE of zero). The results show that as ESCORE increases, all of the 12 measures of earnings management also increase monotonically and consistently. The differences across all measures between the High ESCORE and Low ESCORE group are positive, economically large, and statistically significant. The results, therefore, suggest that ESCORE, despite being constructed using a completely different methodology, is consistent with other more traditional proxies of earnings management. An important feature of the ESCORE which could potentially be a useful tool for future research is that ESCORE could proxy for both accruals and real earnings management. In general, the evidence implies that ESCORE is highly effective in capturing the context of earnings management as when the context is more susceptible (higher ESCORE), firms indeed manage earnings in larger magnitudes and are more likely to be aggressors.

Table 5 Measures of accruals management and real earnings management across ESCORE groups

5.2.2 Multivariate regression

The univariate analysis discussed above has shown that ESCORE is able to capture both accruals and real EM. However, the univariate analysis suffers from possible problems of omitted variables. Particularly, in selecting the individual signals to include in the ESCORE, we deliberately select only those which could be easily constructed using financial statement information. Hence, some dimensions of the context of earnings management may have been omitted, most notably compensation and corporate governance. It has been shown that larger and more independent boards, and especially the audit committees, play a more effective monitoring role and hence constrain earnings management (e.g. Beasley 1996; Dechow et al. 1996; Klein 2002; Xie et al. 2003; Peasnell et al. 2005). With regards to compensation, the existing evidence generally suggests that where managers’ compensation package is linked to performance, they would have stronger motives to inflate earnings (e.g. Dechow and Sloan 1991; Holthausen et al. 1995; Guidry et al. 1999; Bergstresser and Philippon 2006; Burns and Kedia 2006; Efendi et al. 2007). It is important to determine if ESCORE is still related to other measures of earnings management after controlling for these omitted variables and the incremental magnitude of such relationship. In this section, we control for these omitted variables by considering the size of the board, audit committees, the independence of the boards, and the performance-linked components of executives’ compensation packages. In particular, we first estimate the following regressions:

$$\begin{aligned} AEM_{i,t} = &\, \alpha + \beta_{1} BOSIZE_{i,t} + \beta_{2} BOIND_{i,t} + \beta_{3} AUSIZE_{i,t} \\ & + \beta_{4} DUALITY_{i,t} + \beta_{5} PLCOM_{i,t} + \beta_{6} ESCORE_{i,t} \\ & + Year Fixed Effects + Industry Fixed Effects + \varepsilon \\ \end{aligned}$$
(5)

where AEM is replaced in each regression by ADAC, ADWAC, ADCF, ADPROD, ADDISEXP, ATOTALRM; BOSIZE is the number of board directors; BOIND is the percentage of non-executive directors on board to proxy for board independence; AUSIZE is the number of directors on the audit committee (equals to zero if a firm does not have an audit committee); DUALITY is a dummy which is zero if a firm’s CEO is also the chairman; PLCOM is the average performance-linked compensation of all executive directors scaled by sales, where performance-linked compensation is defined as the total of bonus, shares, options and other long-term incentive pay awarded during the year.

In a similar fashion, we also estimate the following logistic regressions of the indicators of aggressive earnings management on ESCORE and the above-mentioned control variables:

$$\begin{aligned} Logit(HEM_{i,t} ) = &\, \alpha + \beta_{1} BOSIZE_{i,t} + \beta_{2} BOIND_{i,t} + \beta_{3} AUSIZE_{i,t} + \beta_{4} DUALITY_{i,t} \\ & + \beta_{5} PLCOM_{i,t} + \beta_{6} ESCORE_{i,t} + Year Fixed Effects \\ & + Industry Fixed Effects + \varepsilon \\ \end{aligned}$$
(6)

where HEM is replaced in each regression by HDAC, HDWAC, HDCF, HDPROD, HDDISEXP, and HTOTALRM.

The following procedure is followed to prepare the sample for the above multivariate regressions, which is a subsample of the main sample. First, we restrict the sample to the period from 2005 to 2011 only because going further backwards would make manual collection of data on compensation and corporate governance very difficult as firm’s annual reports are no longer available online. Second, for all firm-years which remain in the main sample, we manually acquire data on corporate governance and compensation (as described above) from Bloomberg. Third, those firm years which do not have the additional data from Bloomberg, we retrieve their annual reports from Key Note platform and manually collect the relevant data. Finally, those which still have missing data after the above steps, are excluded from the subsample for multivariate regression. This procedure yields a subsample of 2059 observations, smaller than the main sample due to the constraint over availability of corporate governance and compensation data, but still large enough for statistical inferences. All continuous variables are winsorised at the 1st and 99th percentiles to mitigate the influence of outliers.

Table 6 presents the results of the multivariate regression test. The control variables generally have the predicted signs, i.e. measures of earnings management are negatively (positively) related to BOSIZE, BOIND and AUSIZE (DUALITY and PLCOM, respectively). The main focus is on ESCORE, which is shown to be significantly positively related to all measures of earnings management. After controlling for compensation and corporate governance, one unit increase of ESCORE results in an increase of 1.24% (2.54%) in ADAC (ATOTALRM), which is statistically significant at 1% level. Similar conclusions about the positive relationship between ESCORE and the indicators of aggressive earnings management could be drawn by looking at the results reported in Panel B of Table 6 with the coefficient on ESCORE being positive and significant across all regressions (except only for the coefficient of HDDISEXP on ESCORE). The evidence reinforces our earlier conclusion that ESCORE, although estimated differently from those of Beneish (1997, 1999) and Dechow et al. (2011), is still consistent with other traditional measures of earnings management.

Table 6 Measures of accruals management and real earnings management regressed on ESCORE and control variables

5.2.3 Other measures of earnings management

We have so far shown that ESCORE is consistent with other traditional measures of earnings management, including discretionary accruals and real earnings management proxies. Although these measures are the most popular ones in the earnings management literature, they are increasingly being subjected to criticism. For example, Dechow et al. (2010, p. 348) observe that ‘the majority of the studies… are about the determinants and consequences of abnormal accruals derived from accrual models, with the idea that abnormal accruals, whether they represent errors or bias, erode decision usefulness’. In other words, the literature over-relies on models, such as the accruals models, to disentangle the component of earnings subject to managers’ discretion from the ‘normal’ level of performance without fully appreciating that discretionary accrual is a ‘noisy’ measure of earnings management (for example, Holthausen et al., 1995; Fields et al., 2001; Ball, 2013; Owens et al., 2017). With the lack of a comprehensive theory on the accrual generating process (i.e. what accrual would be if there is no manipulation), as a profession we are using (allegedly) mis-specified models trying to measure the ‘immeasurable’ (McNichols, 2000; Dechow et al., 2010; Owens et al., 2017). In addition, some researchers raise a concern about the implausibly large magnitude and high frequency of earnings management documented in the extant literature using accruals models (Ball, 2013; Gerakos and Kovrijnykh, 2013). Ball (2013) ‘worries’ that the current practice that considers a positive (negative) discretionary accrual seems to create ‘the most incorrect belief’ that earnings management is ‘rife’ because technically ‘no observation sits exactly on the regression line’. Walker (2013) describes the existing statistical approaches of detecting earnings management as ‘good for rejecting a null that nobody believes is true’.

Given these criticisms, a valid concern would be that our evidence on the relationship between ESCORE and discretionary accruals and real earnings management proxies might be attributable to the mis-specification and measurement errors of the established models rather than a reflection of the association of ESCORE with actual earnings management. We address this concern and further test the effectiveness of ESCORE in capturing the context of earnings management by looking at how ESCORE is associated with two other measures of earnings management in the next two sub-sections.


Ex-post measure of earnings management

We first employ an ex-post measure of earnings management. In the UK, FRRP is responsible for ensuring financial statements of public companies, the main input to our ESCORE model, comply with applicable laws and financial reporting standards. FRRP selects firms for review based on some published criteria, including firms from specific sectors in the economy which are under particular stress, firms involved with special accounting issues which give rise to judgement, subjectivity and risk of misstatements as well as from complaints from the public, press or the accounting and financial community. As such, similar to the AAER and GAO samples of restatements in the US, the FRRP sample too is not free from selection bias. However, as each institution has a different sampling scheme, the evidence could reinforce each other and the limitations of each source could be mitigated.

If a firm is selected by FRRP for review, several steps are taken, including an initial review, formal and informal discussions before a Review Group being set up if necessary, then a thorough investigation followed by a recommendation to the FRRP chairman. A review may investigate one or more annual reports of the selected firm. At the end of the process, FRRP might decide whether it is suitable for a press notice or not. It is most likely that a press notice is issued in case the directors have agreed that the financial statements are defective and proposed corrective actions have been taken and that FRRP is satisfied with those actions.

From the above description, we define firm-years which are investigated by FRRP followed by a press notice as instances of earnings management. As shown in Panel A of Table 7, there are 70 annual reports with fiscal year ending between 1/1/1995 and 31/12/2012 which are subjected to FRRP press notices. We remove 37 firm-years which are in the financial and utility industries and do not have sufficient data to calculate ESCORE. The 33 remaining cases spread across 22 Datastream level-six industries.

Table 7 ESCORE of FRRP firms

If our ESCORE does capture the context of earnings management, we would expect to see ESCORE of the 33 firms, being investigated by FRRP and subsequently having a press notice (FRRP firms henceforth), is significantly larger in the year subjected to the investigation compared to other years. To test this conjecture, we extract ESCORE (calculated using the whole sample as described in Sect. 3) of 33 FRRP firms for the period from 1995 to 2012 to create a subsample of 576 firm-year observations. As shown in Panel B of Table 7, the size of FRRP firms is generally larger compared to average firms in the main sample (see Table 3), e.g. mean MVE of FRRP firms is £2708 million compared to £390 million in the main sample. It suggests the FRRP’s sampling method is quite biased towards larger firms, which typically play an important role in the economy and if a misstatement exists, it would have more pronounced effects on investors. The mean (standard deviation) of ESCORE in this subsample is 2.17 (1.47). We define the year for which the annual reports are investigated by FRRP as restatement year. Panel C of Table 7 shows that the mean ESCORE of FRRP firms in restatement years (3.24) is significantly larger than that of the rest of the sample (2.10) at 1% level. The magnitude of the difference (1.14) is also large, considering that the standard deviation of ESCORE in the subsample is only 1.47 and that of the main sample (see Table 3) is 1.73.

Ideally, we would have included ESCORE together with some control variables which are potentially related to restatements but not included in ESCORE, such as corporate governance and compensation variables, to the right hand side of a logistic regression where the indicator of restatements is on the left hand side. However, further constraining the sample requiring the availability of corporate governance and compensation data would mean that the sample is too small for any reliable statistical inferences. Instead, we estimate a logistic regression of the indicator of restatement (equals to one for firm-years which are investigated by FRRP followed by a release of press notice, zero otherwise) on ESCORE only and with year and industry fixed effects included. The coefficient on ESCORE, as shown in Panel D of Table 7, is 0.5320 (significant at 1% level). In terms of economic significance, one unit increase of ESCORE would increase the probability of a restatement by 2.02%. Compared to the unconditional probability of 5.73% (33/576), the effect of ESCORE on the likelihood of restatement is economically large. Overall, the evidence suggests that firms which are required to restate their financial statements, especially the income statements, generally have higher ESCORE. It further supports our claim that ESCORE captures the context of earnings management.

The 33 selected cases analysed above might involve different types of restatements which result in different effects on financial statements. For each case, we read through FRRP press notice to determine the nature of the cases and the effects on the firm’s financial statements. We apply the following codes to classify the effects on the firm’s financial statements: IS (if it involves a restatement which affects the income statement) and OT (if the restatement does not affect the income statement). Of the 33 cases selected, FRRP requires restatements from only 12 cases (36%), involving items on the income statement (IS code). Unreported results show that the mean ESCORE of FRRP firms with IS code is 3.20, which is 1.05 unit higher than that of the rest of the sample (significant at 5% level).


Measure of earnings management based on Benford’s law

The ex-post measure of earnings management as used in the previous section has the advantage of having low Type I error. However, as FRRP does not select firms for investigation randomly, it suffers from the issue of generalization as discussed by Dechow et al. (2010). We, therefore, reinforce our evidence of the association between ESCORE and the context of earnings management by employing one more measure of earnings management which does not have the same pitfalls as the ex-post indicator of earnings restatement. In this section, we use an empirical measure of earnings management which is constructed based on the Benford’s law. Benford’s law refers to the observation that many real-life numerical datasets, accounting figures included, are distributed in such a way that the first digits are likely to be small. The probability for the first digit of an accounting item (e.g. an asset or liability reported on the balance sheet or an income or expense on the income statement) being one, according to Benford’s law, is highest (30.10%), followed by the probability for it being two (17.61%), three (12.49), four (9.69%), five (7.92%), six (6.70%), seven (5.80%), eight (5.12%) and nine (4.58%). If a firm, in its financial statements, reports accounting figures which are not in conformity to this distribution, then that is ‘abnormal’. Amiram et al. (2015) shows that the more the reported figures on financial statements deviate from the distribution expected by Benford’s law, the more likely the financial statements actually contain errors, regardless of whether it is a pure unintentional error or a deliberate earnings management attempt. The advantage of this approach of capturing earnings management is that it does not suffer from any statistical biases as experienced by the traditional models of discretionary accruals, nor does it rely on the problematic endogenous relationship between earnings management and firm’s fundamental characteristics.

We, therefore, follow Amiram et al. (2015) to estimate an empirical measure of the deviation of the distribution of the first digits of financial statement items from the distribution expected by Benford’s law as follows. For this test, we collect additional data from Bloomberg. As a result, we have to further constrain the main sample to include only firms which are still listed on London Stock Exchange first as at the end of May 2017. From this initial list of live firms, we keep only the firm-year observations with sufficient data to calculate ESCORE. For each of those observations, we then download from Bloomberg all items on the balance sheet, income statement and cash flow statements. Missing items are replaced by zeros. In line with prior research (e.g. Amiram et al., 2015), firm-years with less than 50 items are dropped to avoid measurement errors. For each item, we keep only the first digit (ignoring the negative sign if an item has a negative value), except for items with the absolute value being less than one for which the first non-zero digit is kept.

The above process results in a sample of 2373 firm-year observations, each having a pool of at least 50 integers ranging from one to nine representing the first digits of items reported on the firm’s balance sheet, income statement and cash flow statement in that year. We then follow Amiram et al. (2015) to calculate FSD_SCORE as follows:

$$FSD\_SCORE_{i,t} = \frac{{\mathop \sum \nolimits_{d = 1}^{9} \left| {AD_{d,i,t} - ED_{d} } \right|}}{9} \left( {d = 1, 2, \ldots , 9} \right)$$
(7)

where \(FSD\_SCORE_{i,t}\) is the deviation of the observed distribution of financial statement figures from the distribution expected by Benford’s law of firm i in year t; \(AD_{d,i,t}\) (d = 1, 2, …, 9) is the actual distribution of digit d, measured as the number of times d appears as the first digit of items reported on firm i's balance sheet, income statement and cash flow statement in year t divided by the total number of items reported on those statements; \(ED_{d}\) is the expected distribution of digit d under Benford’s law (i.e. \(ED_{1}\) = 0.3010, \(ED_{2}\) = 0.1761,\(ED_{3}\) = 0.1249,\(ED_{4}\) = 0.0969,\(ED_{5}\) = 0.0792,\(ED_{6}\) = 0.0669,\(ED_{7}\) = 0.0580,\(ED_{8}\) = 0.0512,\(ED_{9}\) = 0.0458).

We use FSD_SCORE as a proxy for earnings management and examine how ESCORE is associated with FSD_SCORE by estimating the following regression:

$$\begin{aligned} FSD\_SCORE_{i,t} = &\, \alpha + \beta_{1} BOSIZE_{i,t} + \beta_{2} BOIND_{i,t} + \beta_{3} AUSIZE_{i,t} \\ & + \beta_{4} DUALITY_{i,t} + \beta_{5} PLCOM_{i,t} + \beta_{6} ESCORE_{i,t} \\ & + Year Fixed Effects + Industry Fixed Effects + \varepsilon \\ \end{aligned}$$
(8)

The results are presented in Table 8. ESCORE is positively and significantly (at 1% level) related to FSD_SCORE. The evidence, therefore, suggests that firms with higher ESCORE tend to have financial statement figures being distributed more anomalously given the expectation under Benford’s law. It further validates that ESCORE is a good measure of the context of earnings management.

Table 8 FSD_SCORE regressed on ESCORE and control variables (n = 2373)

5.3 ESCORE and stock returns

We have established that ESCORE is able to capture the context of earnings management. The next question is if investors would misprice the context of earnings management. Table 9 reports the buy-and-hold returns on each ESCORE portfolio (0–9), the low, medium and high ESCORE portfolios as well as the hedge portfolio. The t-statistics are reported under the null hypothesis that the corresponding return is zero. The results are easy to summarise. First, as ESCORE increases, all measures of stock returns decrease monotonically. Secondly, low ESCORE stocks earn abnormally high and high ESCORE stocks earn abnormally low returns. Third, the hedge portfolio earns positive abnormal returns.

Table 9 Stock returns across ESCORE groups

Since the results are consistent across different return metrics, we discuss the results based only on the abnormal returns estimated using the four-factor model. The portfolio of stocks with ESCORE of zero earns an abnormal return of 0.33% per month (significant at 5% level). As ESCORE increases, abnormal returns decrease monotonically. The High ESCORE portfolio (includes all stocks with ESCORE of six or higher) earns an abnormal return of –1.04% per month (significant at 1% level). The hedge portfolio that takes long position in Low ESCORE stocks and short position in High ESCORE stocks earns 1.37% abnormal return per month. To put the results in perspective, we compare our findings to other similar return anomalies documented in the literature. For example, Sloan (1996) documents an annual size-adjusted return of 10.4% on a hedge portfolio which takes long position in stocks with low and short position in those with high accruals. Soares and Stark (2009) provide similar results showing that the accruals anomaly exists in the UK with the hedge portfolio earning an abnormal return (adjusted for size and book-to-market factors but without controlling for transaction costs) of 18.7% per year. The annualised return on the hedge portfolio based on our ESCORE is 17.74% (1.0137^12 – 1), which is non-trivial in economic terms. Overall, the result cannot reject our hypothesis suggesting that the market misprices the information contained in the ESCORE, which is designed to capture the context of earnings management.

5.4 Is there another ‘market anomaly’ in disguise?

The results from the portfolio analysis strongly suggest that ESCORE could predict future stock returns. However, there may be some other known ‘market anomalies’ associated with ESCORE which could partly explain the predictive power of ESCORE. This section addresses such concerns.

To see if ESCORE is indeed related to other known patterns in realised returns, Table 10 presents fundamental characteristics of stocks across ESCORE groups. Firm size, measured by either total assets (AT) or market capitalization (MVE), is inversely related with ESCORE. Firms with higher ESCORE are also more likely to issue seasoned equity and debt and have lower NOA. High ESCORE firms are also highly valued by the market evidenced by the monotonic increase of the market-to-book ratio across the ESCORE groups. The decrease in ROA and DROA as ESCORE increases also suggests that high ESCORE stocks are typically less profitable. High ESCORE stocks are also more financially distressed as measured by the ZSCORE.

Table 10 Fundamental characteristics across ESCORE groups

The above patterns raise a concern whether ESCORE could predict future returns beyond the known return effects embedded in it. To start with, we construct ESCORE based on prior literature on earnings management, and not from market anomalies’ literature. Therefore, the signals embedded in ESCORE do not necessarily include only those factors which are known as stock return predictors. We argue that the predictive power of ESCORE comes from the context of earnings management which is revealed collectively by the composite ESCORE, and not by the predictive power of the individual signals separately. In fact, the established literature even suggests that some signals, including ESIZE and EBLOAT, would predict future returns in the opposite direction. Particularly, based on the established evidence of the size effect (e.g. Banz 1981) and the irrational market reaction to balance sheet bloat (e.g. Hirshleifer et al. 2004; Gray et al. 2018), stocks with ESIZE and EBLOAT of one (smaller stocks and those which have smaller NOA) are expected to earn higher (not lower) future returns. Meanwhile, the literature is silent about whether other signals, namely EROA, EDROA, EDIV, EDEBT, EDDEBT, EMA, ECYCLE, EAUDIT and EBT, could predict future returns or not. The concern lies, therefore, mainly with the high market-to-book ratio, high likelihood of issuing seasoned equity, more financial distress and low profitability of high ESCORE stocks. Prior research widely documents that abnormally low returns are associated with high market-to-book firms (e.g. Fama and French 1992; Lakonishok et al. 1994), seasoned equity offers (e.g. Loughran and Ritter 1995; Spiess and Affleck-Graves 1995; Chen et al. 2019), firms with negative ZSCORE (e.g. Agarwal and Taffler 2008), and firms with lower profitability (e.g. Ou and Penman 1989a, b; Piotroski 2000, Fama and French 2006). These known patterns of returns are embedded in ESCORE through EOV, ESEO and EDISTRESS. In addition, as ESCORE is designed to capture the context of earnings management, it is also important to control for the documented market mispricing of discretionary accruals (Xie 2001).

While previous studies (such as Jiang et al. 2016) demonstrate a number of other risk factors, we primarily focus only on factors which might have already been embedded in ESCORE as identified from the preceding analyses. To demonstrate that ESCORE is still significantly associated with future returns after controlling for the above five anomalies, we estimate the following regression using Fama–MacBeth methodology with the t-statistics calculated using the Newey-West corrected standard errorsFootnote 14:

$$\begin{aligned} RET_{i,t + 1}^{a} = & \alpha + \beta_{1} Ln\left( {MVE_{i,t} } \right) + \beta_{2} MTB_{i,t} + \beta_{3} ROA_{i,t} + \beta_{4} ESEO_{i,t} \\ & + \beta_{5} EDISTRESS_{i,t} + \beta_{6} NOA_{i,t} + \beta_{7} DAC_{i,t} + \gamma ESCORE_{i,t} + \varepsilon \\ \end{aligned}$$
(9)

where \(RET_{i,t + 1}^{a}\) is annual buy-and-hold return measured from June of year t + 1 to May of year t + 2 and is replaced by \(BHRR_{i,t + 1}^{a}\), \(BHSAR_{i,t + 1}^{a}\) and \(BHAR4F_{i,t + 1}^{a}\).

Table 11 presents the results of estimating Eq. (9) along with four other specifications where we exclude ESCORE and DAC one-by-one and together as well as the last specification where we only keep ESCORE and DAC as explanatory variables. Each panel reports the results of a return metric.

Table 11 Stock returns regressed on DAC, ESCORE and control variables

In Table 11, all control variables have the predicted signs. DAC is always negative and significant, which is in line with the existing literature (e.g. Xie 2001). The focus of the paper is the coefficient on ESCORE, which is always negative and significant in all specifications. We, therefore, argue that ESCORE can predict stock returns beyond the existing anomalies. From specification (4) in Panel C of Table 11, one unit increase in ESCORE pulls annual four-factor risk-adjusted returns down by 1.40%. As a comparison with the portfolio analysis where we do not control for other market ‘anomalies’, the annualised buy-and-hold four-factor risk-adjusted returns of the hedge portfolio reported in Table 9 is 17.74% (1.0137^12 – 1). The average ESCORE of the low ESCORE portfolio is 0 and that of the high ESCORE portfolio is 6.56 [(519 × 6 + 232 × 7 + 88 × 8 + 26 × 9) / 879], yielding a difference of –6.56. Therefore, after adjusting for other known market anomalies the four-factor risk-adjusted returns on the hedge portfolio shrink from 17.74% to 9.18% per year (1.40 × 6.56), which is still significant in economic terms.

One issue with the above multivariate regression is the correlation between the control variables and ESCORE, as highlighted in Table 10. We respond to this issue in two ways. First, we drop the control variables in Eq. (9) one at a time, one pair at a time, and all together. For brevity, we only report the result when all control variables are dropped (specification (5) in Table 11). In all of those specifications, the main conclusions of the paper remain unchanged.

Another way to deal with this issue is to exclude ESEO, EDISTRESS and EOV from the construction of ESCORE. We calculate four compressed versions of ESCORE in which ESEO, EDISTRESS and EOV are dropped one by one from the construction of ESCORE, and all together. We then redo the returns analysis and perform multivariate regressions. Untabulated results confirm that none of the main results change qualitatively. The hedge portfolio, using ESCORE without ESEO, EDISTRESS and EOV, yields an average \(BHSAR^{m}\) (\(BHAR4F^{m}\)) of 0.92% (1.01%, respectively) per month, all statistically significant at conventional levels. Using the compressed ESCORE without ESEO, EDISTRESS and EOV to estimate Eq. (9), the coefficient on ESCORE is –0.0172 (–0.0138) when \(BHSAR^{a}\) (\(BHAR4F^{a}\), respectively) is the dependent variable, all being statistically significant at conventional levels. We therefore conclude that the power of the ESCORE to predict future returns goes beyond the known patterns of returns related to other known market anomalies.

5.5 Generalization of the results to other markets

Although our use of the UK market as the setting is well justified and makes a significant contribution to the advancement of knowledge in this strand of research, a valid concern is whether the results of this study are generalizable to other markets, especially the US where many previous studies in this area focus on. To address this concern, we replicate some of the main analyses in the paper using a sample of US (live and dead) listed stocks during the period from 1987 to 2013. The data collection and sampling procedures are the same as used for the main tests. The US sample contains 87,645 observations. We winsorise all continuous variables at the 1st and 99th percentiles to mitigate the influence of outliers.

We first replicate the results as reported in Table 5. Untabulated results show that all measures of earnings management increase monotonically when ESCORE increases. The means of ADAC and HDAC of high-ESCORE stocks (7757 observations) are 0.1526 and 0.2858, respectively, while the corresponding means of low-ESCORE stocks (7986 observations) are 0.0670 and 0.0995. The differences are both economically large and statistically significant at 1% level. When looking at real earnings management, the means of ATOTALRM and HTOTALRM of high-ESCORE stocks (0.6219 and 3.3019, respectively) are also significantly higher than those of the low-ESCORE counterparts (at 1% level). We then conduct a replication of Table 9 using raw and market-adjusted returns. The low-ESCORE portfolio earns an annualised buy-and-hold market-adjusted return of 8.78%, which is significantly higher than that of the high ESCORE of − 9.8%. The hedge portfolio which takes long position in low-ESCORE and short position in high-ESCORE stocks earn a return of 18.58% per year after controlling for the market factor, which is significant at 1% level and economically large. We also replicate Table 11 using both Fama–MacBeth and pooled regressions of raw and market-adjusted returns on ESCORE and the control variables. Again, all main conclusions from the main tests are generalizable to the US market. After controlling for size, book-to-market, profitability, seasoned equity, financial distress, balance sheet bloat and discretionary accruals, one unit increase of ESCORE pulls the annual buy-and-hold market-adjusted returns by 2.6%, which is statistically significant at 1% level. In general, while we invite future research to replicate other aspects of our analyses by pooling in more data, e.g. compensation, corporate governance, restatements in the US or in another market, a restricted replication as reported in this section provides some reassurance that our main findings are generalizable to other developed markets, especially the US.

6 Conclusions

This study demonstrates that an index, named as ESCORE which accumulates 15 individual financial-statement-based signals, can capture the context of earnings management and reliably predict future stock returns. We show that ESCORE is related to both accruals-based and real earnings management measured using various ways which are popularly used in the extant literature. Firms which are required to restate their financial statements by the Financial Reporting Review Panel in the UK also exhibit a significantly higher ESCORE in the restatement year. Higher ESCORE firms also have the distribution of financial statement figures being more divergent from what is expected under Benford’s law. Having established that ESCORE is effective in capturing the context of earnings management, we use ESCORE to form portfolios and find that after adjusting for market, size, book-to-market and momentum factors, low ESCORE stocks outperform high ESCORE stocks by 1.37% per month, which is both statistically and economically significant. We also report that the main findings of the paper are generalizable to the US market.

The paper makes several contributions to the literature. First, ESCORE could be used as an alternative empirical proxy for earnings management. The appeal of ESCORE is that it allows financial statement users to assess the reliability of reported earnings by looking at the surrounding context rather than the magnitude of the actual earnings and its components. In addition, ESCORE captures both accruals-based and real earnings management as well as financial reporting violations which require restatements. The ESCORE is, hence, particularly advantageous to use in subsequent studies as it only focuses on earnings management and makes no prediction regarding which methods have been used to manage earnings.

The second and more important contribution of the paper is that ESCORE can be applied by investors to screen out the information about the context of earnings management which is mispriced by the market, and hence earn (avoid) economically large abnormal returns (losses). We also add to the literature on fundamentals-based strategies by showing that the dimension of financial statement information which suggests earnings management exists is also mispriced. ESCORE has undeniably not been designed to capture all signals of suspicious earnings management. We deliberately focus on the context which could be easily extracted from annual financial statements, hence the exclusion of areas such as performance-linked compensation, institutional holding, and corporate governance. The reason is twofold. First, we propose a parsimonious model which covers a broad range of signals for which data can be easily obtained in practice. Second, we want to avoid the constraint of data availability which could severely depress the sample size if compensation, institutional holding, and corporate governance variables are included. Dechow et al. (2011) argue that the inclusion of such variables would introduce biases into the sample due to data unavailability. Nevertheless, we feel that these omissions do not affect the main conclusions of the paper and invite future research to expand our model to cover these aspects of the context of earnings management. Besides, the ESCORE is constructed as the sum of 15 signals each carrying an equal weight. This approach is suitable to create a composite score that accumulates various aspects of the general context of earnings management and the resulting ESCORE is relatively easy to construct and understand. Nevertheless, although all selected signals are valid predictors of earnings management, the equal-weighted approach does not assess the relative importance of each signal. An alternative index construction approach which could deal with this issue is through a regression, but only if one could identify a suitable left-hand-side variable. Moreover, the conversion of some of the components of ESCORE from continuous variables to binary signals could potentially be problematic. For example, managers could still manage earnings while keeping those values not look extremely high or low, hence not being flagged up by the ESCORE. While we still assert that the ESCORE as designed in this paper is suitable for the intended purposes, future research which seeks to develop a weighted and continuous model to capture the context of earnings management seems an interesting idea.