Can the FSCORE add value to anomaly-based portfolios? A reality check in the German stock market

This paper examines the added value of using financial statement information, particularly that of Piotroski’s (J Account Res 38:1, 2000. https://doi.org/10.2307/2672906) FSCORE, for equity portfolio selection in the German stock market in a realistic research setting in which the critique against the implementability of FSCORE-based trading strategies is taken into account. We show that the performance of annually rebalanced long-only portfolios formed on any of the examined 12 accounting-based primary criteria improves by including the FSCORE as a supplementary criterion. Our study is the first to show that although the FSCORE boost is strongest for the 1-year holding period length, it also holds, on average, for the 3-year holding period. The use of a 3-year updating frequency is particularly beneficial for the low-accrual portfolio that—when supplemented with the high-FSCORE threshold—generates the best overall performance among all 75 portfolios examined. Moreover, we show that a high FSCORE is also an efficient stand-alone criterion for long-only portfolio formation.


Introduction
Recent evidence has shown that Piotroski's (2000) FSCORE, which employs commonly used financial statement variables to identify stocks with high fundamental strength, 1 is among the most efficient quality criteria in building combined valuequality equity portfolios (e.g., see Piotroski and So 2012;Walkshäusl 2017). It is also widely used in the investment industry. On the other hand, the real-world implementability of trading strategies based on the FSCORE has been questioned by some scholars; for example, Kim and Lee (2014) state that both the level and the significance of abnormal returns reported in the seminal study by Piotroski (2000) are severely overstated due to a look-ahead bias in his research design. Evidence has also shown that the efficacy of combined value and FSCORE criteria is at its greatest in small-cap universes of stocks (e.g., Piotroski 2000;Novy-Marx 2014). In addition, it has also been argued that most of the related abnormal returns have come from the short side (e.g., see Hyde 2018), where transaction costs are typically higher than those on the long side (e.g., see D'Avolio 2002; Richardson et al. 2010). For these reasons, such returns could be unachievable for larger-scale investors, who often face short-sale impediments (e.g., see Shleifer and Vishny 1997;Beneish et al. 2015). In addition, the price impact of higher-volume market orders is stronger among small-cap stocks, thereby hampering the real-world implementability of such investment strategies, whose performance is outstandingly better in the small-cap stock universe than in the larger-cap universe (Soares and Stark 2009).
Recently, Pätäri et al. (2018c) have examined the added value of using Piotroski's FSCORE as a supplementary long-only selection criterion alongside the 12 primary criteria that were based on previously documented accounting anomalies (Most of the employed primary criteria were based on value anomalies). Their results from the German stock market over the period [2000][2001][2002][2003][2004][2005][2006][2007][2008][2009][2010][2011][2012][2013][2014][2015] show that the performance boost from the inclusion of the high-FSCORE threshold is remarkable for not only the trading portfolio of high book-to-price (B/P) stocks, but also for all the other 11 anomaly-based trading portfolios being examined. In addition, the inclusion of the high-FSCORE threshold alongside a primary criterion tended to decrease the smallcap exposure of the long-only portfolios, which is in line with recent U.S. evidence provided by Novy-Marx (2014). However, Pätäri et al. (2018c) do not take transaction costs into account. In addition, they use a very low market-cap threshold of €5 million to exclude only the tiniest microcap stocks. This could also partially explain 1 Based on nine binary signals designed to measure three different dimensions (i.e., profitability, change in financial leverage/liquidity, and change in operational efficiency) of firms' fundamental strength, Piotroski's FSCORE forms a composite quality score within a range of 0 to 9 (see "Appendix 1" for the calculation principles of the nine binary signals, whose realization is classified as either "good" or "bad," depending on each signal's implication for future profitability and cash flows. An indicator variable for each signal is set equal to one (zero) if the signal's realization is good (bad)). The prediction power of the FSCORE on future returns is based on empirical evidence that if the number of positive binary signals is high (low) enough, the stocks of such firms will, on average, generate positive (negative) excess returns above (below) the stock market average in the future as a result of improvement (deterioration) in firms' financial health (e.g., see Piotroski and So 2012). their striking results if the first-stage portfolios consisting of the most attractive tertile of stocks based on the primary criterion included a high proportion of such microcaps. 2 To check whether their results are driven by these two research design choices, we include transaction costs and raise the market-cap threshold to €50 million, consistent with Schmidt (2017).
In line with Piotroski's (2000) seminal study, Pätäri et al. (2018c) use an FSCORE of 8 as a lower boundary to distinguish high-FSCORE stocks. This results in relatively narrow portfolios during the holding periods following such portfolioreformation points, at which the total number of high-FSCORE stocks in the tertile portfolio formed on the primary criterion was low. In accordance with recent studies on the FSCORE (e.g., see Choi and Sias 2012;Piotroski and So 2012;Zhu et al. 2019;Ng and Shen 2020;Walkshäusl 2020), we extend the group of high-FSCORE stocks to also include stocks with an FSCORE of 7 to keep the FSCORE-related portfolios sufficiently diversified throughout the sample period. This extension is particularly important because raising the market-cap threshold outstandingly narrows the sample. In addition, recent aggregate evidence seems to support the use of 7 instead of 8 as a lower boundary for the high-FSCORE criterion (see Piotroski and So 2012;Duong et al. 2014;Broussard et al. 2016;Hyde 2018).
German stocks provide an interesting setting for this type of research design as during the 15-year sample period from May 2000 to April 2015, the average stock market return in Germany has been only 4.81% p.a., being far below its long-term historical average. In addition, the value anomaly is documented to have been relatively weak in Germany (e.g., see Capaul et al. 1993;Fama and French 1998;Artmann et al. 2012), whereas the German size premium (calculated analogously to the Fama-French [1993] U.S. size factor) has been negative (− 3.64% p.a.) during the sample period. Therefore, it is interesting to examine whether the high-FSCORE criterion can add value to equity portfolio selection in such challenging conditions after taking account of transaction costs, lowering the high-FSCORE threshold to 7, and excluding microcap firms. Further motivation for focusing on this particular stock market is provided by the evidence for high earnings management among German firms (e.g., see Leuz et al. 2003;van Tendeloo and Vanstraelen 2005;Haw et al. 2012). In light of such evidence, it is also likely that the divergence in investors' opinions on fair values of firms is greater than it is in the countries where earnings management is less common. Consequently, higher potential for finding mispriced stocks would offer more profitable trading opportunities for the investors who are not misguided by earnings management activities.
Because earlier evidence has shown that the optimal holding period length for value portfolios may be remarkably longer than the most commonly used 1-year period (e.g., see Lakonishok et al. 1994;Bird and Casavecchia 2007), we also compare the performance of portfolios that are re-formed less frequently than once a year to the performance of annually re-formed portfolios. To the best of our knowledge, the performance impact of extending holding periods beyond the 1-year horizon has not been examined before on a risk-adjusted basis in the context of FSCOREboosted portfolios or plain FSCORE portfolios, and even corresponding comparisons of raw returns are rare in the related literature. This research gap may have remained due to the results of Piotroski (2000), according to whom raw returns were higher on both long-only and long-short bases for annually re-formed FSCOREboosted portfolios than for corresponding portfolios re-formed every other year. 3 Although the FSCORE can be interpreted as a proxy for fundamental momentum (e.g., see Ahmed and Safdar 2018), the potential mispricing 4 of high-FSCORE value stocks will not necessarily right itself within the 1-year holding period to the extent that it would not be reasonable to hold such stocks for a longer period. Although they would no longer be among the most heavily underpriced stocks, their price recovery could continue for several years, just as has been documented for portfolios formed on value-only criteria (e.g., see Lakonishok et al. 1994;Bird and Whitaker 2003). Therefore, it is worthwhile to also examine the impact of the use of FSCORE on portfolio performance for longer than 1-year holding periods.
The portfolio formation frequency is also related to performance differences explained by differences in transaction costs. Ceteris paribus, less frequent portfolio reformation implies lower transaction costs. On the other hand, in principle, it could be possible that less frequent updating would result in more radical changes in portfolio contents to the extent that most of the benefits from less frequent trading would be mitigated. In spite of the importance of the impact of transaction costs on portfolio performance comparisons (e.g., see Soares and Stark 2009;Richardson et al. 2010;Zaremba and Andreu 2018), almost all earlier peer-reviewed studies on the FSCORE have omitted transaction costs.
Our main results can be summarized as follows: By taking into account the critique against the real-world implementability of FSCORE-based trading strategies, we find strong evidence indicating that during the period from May 2000 to April 2015, when the average stock market return was exceptionally low in Germany, the performance of long-only portfolios re-formed annually on all 12 accounting-based portfolio-formation criteria examined (henceforth "primary criteria") would have, without exception, improved as a result of including the FSCORE as a supplementary criterion. For many of the FSCORE-boosted portfolios re-formed on either an annual basis or every third year, their risk-adjusted outperformance against the comparable primary portfolios is also statistically significant. In addition, all such FSCORE-boosted portfolios would have significantly outperformed the market portfolio in terms of total risk-adjusted returns. We also contribute to the existing literature by showing that in spite of the FSCORE boost being at its strongest for the 1-year holding period length, it also holds for the 3-year holding period. To the best of our knowledge, our study is the first to show that the FSCORE boost extends to 3-year horizons. The use of a 3-year holding period is particularly beneficial for the low-accrual portfolio that, when supplemented with the high-FSCORE threshold, generates the best overall performance among all 75 portfolios examined. Moreover, we show that the FSCORE is also efficient as a stand-alone criterion for longonly portfolio formation, since based on almost all risk-adjusted performance metrics employed in the comparisons between the portfolios of equal holding period lengths, high-FSCORE portfolios outperform all the portfolios formed on the 12 primary criteria during the 15-year sample period.
The decomposition of the full-sample-period performance into separate bull-and bear-period performances reveals that the better performance of FSCORE-related portfolios is mostly attributable to their tendency to decline far less in bearish conditions than do the primary portfolios and the market portfolio, in particular. On average, the FSCORE-boosted portfolios re-formed annually and every third year also generate higher returns in bullish conditions than the primary portfolios and the market portfolio, but the comparable monthly return differences in favor of the FSCORE-boosted portfolios are remarkably smaller in bullish than in bearish conditions. Their defensive characteristic is also confirmed by the maximum drawdown statistics, the averages of which are clearly smallest for the annually re-formed FSCORE-boosted portfolios, thereby indicating that for the shortest holding period length being examined, the inclusion of the high-FSCORE threshold alongside a primary criterion also decreases the crash risk exposure of the related portfolios. However, the same does not hold for the holding period lengths of three and five years. In general, our results show that even after controlling for most commonly adduced explanatory reasons for anomalous characteristic of FSCORE, its added value extends far beyond what was originally documented in the seminal study of Piotroski (2000), in which it was used for selecting financially strong firms among high B/P ones.
The structure of the remainder of the paper is as follows: Sect. 2 discusses the rationales upon which the prediction power of the examined portfolio-formation criteria on future stock returns can be based. Section 3 describes the data and the methodology, while Sect. 4 introduces the empirical results. Section 5 concludes with implications for further research.

Theoretical motivation for the existence of accounting anomalies
It is well-documented in both accounting and financial literature that fundamental strength measures have prediction power on future returns (e.g., see Holthausen and Larcker 1992;Abarbanell and Bushee 1998;Piotroski 2000;Kumsta and Vivian 2020;Ng and Shen 2020). The same also holds for the prediction power of valuation ratios (e.g., see Pätäri et al. 2018b), accruals-to-assets (e.g., see Ohlson and Bilinski 2015), and many combination criteria formed on the basis of two or more standalone criteria (e.g., see Bartov and Kim 2004;Yan and Zheng 2017). With respect to the theoretical reasons for the existence of the examined anomalies, the related literature can be divided into two strands, of which the first aims to explain anomalies by mispricing stemming from investors' irrational or uninformed behavior: For value anomalies, many behavioral explanations have been suggested: For example, Lakonishok et al. (1994) find that the value premium stems from investors' tendency to extrapolate past growth too far in the future, resulting in underpricing of low-growth value stocks and overpricing of high-growth glamour stocks. This tendency is reinforced by analysts' biased earnings forecasts that are too pessimistic for value stocks and too optimistic for glamour firms, as documented by Engelberg et al. (2018). 5 In the case of accrual anomaly, its first behavioral explanation was tested in Sloan's (1996) seminal study, which concludes that investors fixate too heavily on earnings reported in income statements, thereby underweighting the information embedded in accruals. Also, Sloan's (1996) finding that anomalous returns related to accruals are concentrated around subsequent earnings announcements support mispricingbased explanations. In addition, Bradshaw et al. (2001) show that analysts' earnings forecasts are more optimistic for firms with high level of accruals relative to low level of accruals. In the light of So's (2013) evidence of investors' tendency to overweight analysts' forecasts, the divergence in analysts' earnings estimation errors between high-and low-accrual firms may also induce the accrual anomaly, similarly to how it may magnify value anomalies. Moreover, firms manage earnings using accruals in order to influence investors' perceptions (Cohen and Zarowin 2010), thereby increasing the likelihood for the existence and magnitude of mispricing.
The behavioral explanation for the positive relation of the FSCORE with subsequent returns is often interpreted to stem from the slow incorporation of public information into stock prices, or more specifically, from delays in investors' responses to changes in earnings forecasts or future cash flows (e.g., see Piotroski 2000;Elgers et al. 2001). 6 Based on the differences in the speed of investors' reactions to the revised expectations, it is possible to divide the investors into sub-groups: those who trade on public information that is not yet fully priced (more sophisticated investors) and those who act as counterparts for the trades implemented by the first subgroup of investors because of the gradual dissemination of public information (less sophisticated investors). As sophisticated investors' demand (supply) must be offset by less-sophisticated investors' supply (demand), the purchases (sales) of the underpriced (overpriced) stocks of the financially strong (weak) firms by sophisticated investors drives the prices of such stocks higher (lower).
According to the prior literature (e.g., see Nofsinger and Sias 1999;Cohen et al. 2002), sophisticated investors are mostly institutional investors, whereas individual investors are, on average, less sophisticated. Based on this classification, Choi and 1 3 Can the FSCORE add value to anomaly-based portfolios? A reality… Sias (2012) show that financial strength measured by Piotroski's FSCORE predicts both subsequent returns and institutional investors' demand. In addition, they show that the relation between the FSCORE and subsequent returns is at least partially driven by subsequent institutional demand driving subsequent returns. Ng and Shen (2020) document parallel findings for the five Asian national markets (i.e., Hong Kong, Japan, Korea, Singapore and Taiwan). The reported interlinkages between institutional demand, the FSCORE and returns are particularly interesting in light of the seminal work of Piotroski (2000), according to whom the benefits of using the FSCORE screen were concentrated in small-and medium-sized firms with low share turnover and firms with no analyst coverage. However, it should be noted that Piotroski's conclusions are drawn within the universe of high-B/P stocks, whereas in Choi and Sias' (2012) and in Ng and Shen's (2020) papers, the stock universe is not limited to high-B/P firms. Also, the time span differences between the sample periods in these three studies may partially explain the divergence of their results.
Although mispricing-based explanations are relatively similar to all the accounting anomalies being examined in this paper, their risk-based explanations are much more divergent: For example, the value premium has been explained by higher distress risk of value firms relative to glamour firms (e.g., see Fama and French 1992;Vassalou and Xing 2004), higher operating leverage of value firms (e.g., see Zhang 2005; García-Feijóo and Jorgensen 2010), higher macroeconomic risk of value stocks (e.g., see Cooper 2006;Gulen et al. 2011), shorter cash flow duration of value stocks (e.g., see Da 2009;Chen 2017), or by higher aggregate cash flow risk of value firms (e.g., see Campbell and Vuolteenaho 2004;Da and Warachka 2009). By contrast, the number of studies providing risk-based explanations for the accrual anomaly is remarkably lower: Among few such studies, Chichernea et al. (2015) show that the magnitude of the accrual anomaly varies systematically over time, and that this variation is significantly related to cross-sectional return dispersion of stocks, which is a macroeconomic variable linked with other growth-related anomalies. According to their results, low-accrual firms have, on average, significantly higher exposure to the risk captured by return dispersion that significantly explains future returns so that the anomalous returns of accrual anomaly shrink in magnitude and become insignificant during periods of low return dispersion.
Another recent risk-based explanation for the accrual anomaly is provided by Ball et al. (2016) who state that high-accrual firms earn lower future returns because they are less profitable on a cash basis. Their findings also explain why the accrual anomaly increases when evaluated using an asset pricing model that includes a profitability factor: this happens because accruals allow the regression to extract the cashbased component from the accruals-based profitability variable.
Because fundamental strength measures, such as FSCORE (among others), proxy for expected profitability, the relation between fundamental strength and future returns is also in line with risk-based explanations, since according to the standard valuation equation, ceteris paribus, higher expected profitability implies a higher discount rate, and consequently, a higher required return (e.g., see Fama and French 2006). Although both the gradual incorporation of information and riskbased explanations imply a positive relation between expected profitability and subsequent returns, the reasons for this relation are different; according to risk-based explanations, changes in investors' expectations are immediately impounded into stock prices, and the prediction power of fundamental strength on future returns arises because higher expected profitability is associated with higher risk. According to mispricing hypotheses, it arises from relative over-and/or undervaluation suggesting that subsequent returns represent delayed corrections of prices towards fundamental value. As discussed by Lewellen (2010), it is often challenging to classify to which of these two main categories a theoretical explanation falls. In many cases, anomaly profits may be partially explained by both behavioral and risk-related reasons (see, e.g., Polk and Sapienza 2009;Engelberg et al. 2018). On the other hand, it is important to note that for an anomaly to survive, at least a part of anomalous performance should remain unexplained.
A third group of explanations for the accounting anomalies relies on the data snooping or other biases related to data (e.g., see Linnainmaa and Roberts 2018) or research design (Harvey et al. 2016). However, in the light of long-term evidence of the examined anomalies, it seems extremely unlikely that they could be fully explained by data-or methodology-related biases (e.g., see Lewellen 2010; Pätäri and Leivo 2017), although the anomaly literature is certainly not free of such biases.

Sample selection and methodology
The portfolios are composed of non-financial German stocks quoted in the Frankfurt Stock Exchange (FSE). We examine 12 primary portfolio-formation criteria, of which six (i.e., book-to-price (B/P), earnings yield (E/P), sales-to-price (S/P), operating cash flow-to-price (CFO/P), market leverage (MLEV), and accruals-to-assets (ACCR)) are stand-alone criteria (calculated as described in "Appendix 2"). The remaining six represent combination criteria, in which the average of the rankings based on two criteria at each portfolio-formation point determines the composite rank score for each firm. Based on earlier literature, these six criteria are based on the corresponding combinations of B/P and E/P, B/P and return on equity (ROE, defined as in "Appendix 2"), B/P and ACCR, B/P and S/P, CFO/P and S/P, and E/P and S/P. 7 Similar to Choi and Sias (2012), Duong et al. (2014), Hyde (2018), and Walkshäusl (2020), we also form plain high-FSCORE (from 7 to 9) portfolios to compare their performance with the portfolios formed on the 12 primary criteria, as well as with the corresponding FSCORE-boosted portfolios.
To maintain the best possible comparability of our results with Pätäri et al. (2018c), we focus on the performance of long-only portfolios and employ the same sample period that is from May 2000 to April 2015. To be included in the portfolios, the firms must have financial statement variables available for two previous 1 3 Can the FSCORE add value to anomaly-based portfolios? A reality… fiscal years preceding each portfolio-formation point. To avoid survivorship bias, the sample also includes the stocks of the companies that were delisted during the sample period. If an issuer has had two or more stock series listed, only the one with the highest trading volume is included in the sample. To ensure that all firmyear observations are equally fresh when forming the quantile portfolios, the firmyears with fiscal year ends other than December are excluded from the sample. Following Schmidt (2017), the stocks with a market capitalization below €50 million are excluded to avoid the results being biased by microcaps. The stock price data (adjusted for dividends, splits and capitalization issues, respectively) is from the Thomson Reuters Datastream, whereas the financial statement data is from Worldscope. One-month EURIBOR (downloaded from Datastream) is used as a proxy for the risk-free rate of return. To be included in the sample of investable stocks at each portfolio-formation point, the firms must have all the information available for the calculation of all the 13 selection criteria. Although this prerequisite reduces the number of otherwise usable firm-year observations, it enables the best possible comparison of the results based on different single selection criteria and/or combination criteria, consistent with Dhatt et al. (2004) and Pätäri et al. (2018a). In line with the existing literature (e.g., see Fama and French 2008), we also exclude firm-year observations for which the book value of equity is negative. After all exclusion criteria, the final sample size ranges from 181 firms in the year 2000 to 256 in 2008, comprising 3,364 firm-year observations, with complete data for each of the 13 selection criteria over the period 2000-2015. Accounting data is from the financial statements of the year preceding a portfolioformation year (or from two or three latest financial statements in the cases, where financial statement variables are based on their averages over two consecutive fiscal years or their year-to-year changes). Because our trading portfolios are (re-)formed four months after a fiscal year end (in line with Piotroski and So 2012) to ensure that financial statement data for that particular fiscal year would certainly have been published before the portfolio updates, this four-month lag enables calculating the market values of equity used in the denominators of the valuation multiples based on the closing prices of the latest trading day preceding the first of May. This practice is also followed in our study because it is rational to use the latest available information in portfolio formation, consistent with Lakonishok et al. (1994) and Asness et al. (2013), among others. Sample characteristics are shown in Table 1, in which cross-sectional medians for the six single selection criteria, as well as for the market value of equity are reported at each portfolio-formation point.
The sample stocks are first ranked based on valuation multiples, accruals-toassets, market leverage or combination criteria calculated on the day preceding every rebalancing date, that is, the last trading day of April. Next, we divide the sample firms into tertile portfolios based on each of the above-mentioned primary selection criteria. Following Lakonishok et al. (1994), we re-form the portfolios by using three different updating frequencies of one year, three years and five years. 8 The FSCORE-boosted portfolios are then formed of the stocks with the three highest FSCOREs (i.e., from 7 to 9) within the tertile that, according to the anomaly literature, should include the best-performing stocks of the future (For stand-alone value portfolios, the best-performing tertile should be the top tertile that consists of the stocks with highest fundamental-to-price ratios, whereas in the case of accruals, the best-performing tertile should be the bottom tertile). To better extract the added value of the FSCORE, we compare the performance of the FSCORE-boosted portfolios with the corresponding primary quantile portfolios in which the number of constituent stocks is set equal to that of the high-FSCORE stocks in the corresponding tertile portfolio for each holding period. 9 For a realistic performance comparison of trading portfolios, we form monthly time series for each portfolio by taking account of both the weight changes within the holding periods caused by price fluctuations of the stocks in the investment portfolios and the transaction costs incurred from the purchases and sales of the stocks. The stocks included in each portfolio are equal-weighted at each portfolio-reformation point. 10 The portfolio weights for the stocks that remain in the portfolio over the portfolio-reformation point are rebalanced at each checkpoint by buying or selling a proportion of the existing share of ownership to the extent that the new weight corresponds to the inverse of the number of portfolio firms in the rebalanced portfolio Footnote 8 (continued) also calculated the corresponding performance statistics for the 3-and 5-year holding periods by shifting the starting point of the portfolio formation forward by one year at a time. The un-tabulated results based on such overlapped data were qualitatively similar to our reported results). 9 For example, when the number of high-FSCORE stocks (determined on the basis of 1999 financial statements) included in the CFO/P top-tertile in spring 2000 is 24, the ultimate CFO/P value quantile portfolio being compared against the combined CFO/P and FSCORE portfolio consists of the same number of the highest CFO/P stocks during the subsequent holding period. 10 Although we also calculated the corresponding portfolio performance statistics on value-weighted basis, we henceforth focus on the results based on our main portfolio-formation methodology (i.e., on equal-weighting at each (re)-formation point) for following reasons: For many sub-periods and portfolio-formation criteria, value-weighting resulted in very poorly diversified portfolios in such cases, where only one or two megacaps were included beside many smaller-cap firms (As an example, one megacap stock got 75% weight in a portfolio of 28 stocks, thereby reducing the average weight for the remaining 27 stocks to less than 1% per stock). As a result of this, the value-weighted portfolios have generally a remarkably higher idiosyncratic risk, and consequently, worse risk-adjusted performance statistics. In addition, given that value-weighting attenuates all the anomalies being examined (e.g., see Fama and French 2008;Taylor and Wong 2012;Novy-Marx 2014;Hou et al. 2020), hardly any portfolio manager aiming to exploit these anomalies would weight the constituent stocks based on their market caps when deciding on equity portfolio allocation. Therefore, from the viewpoint of practical implementability of trading strategies, which is the focus of this study, our approach is more realistic. In general, the value-weighted results are qualitatively similar to our main results with the respect that on average, the FSCORE boost is clearly at its strongest for 1-year holding period length, attenuating towards longer holding periods to the extent that among the value-weighted portfolios with a 5-year updating frequency, the net return difference between the primary and FSCORE-boosted portfolios is practically zero (The average net returns are 10.42% p.a. vs. 10.43% p.a., respectively) Moreover, the FSCORE boost for the two shorter holding periods is somewhat weaker than documented in our main analysis (In terms of the average value-weighted net returns comparable to those reported in Tables 2 and 3, the return difference in favor of the FSCORE-boosted portfolios is 3.41% (0.56%) p.a. for the 1-year (3-year) updating frequency).
as closely as possible. The same principle is followed when determining the number of new entrants' stocks to be bought. As a result, monthly return time series for each quantile portfolio over a 15-year investment period are obtained.
Based on the realized one-way transaction costs of institutional investors in the German stock market reported in Domowitz et al. (2001), we use a transaction-cost estimate of 0.377% per trade for all purchases and sales. Given the development in the microstructure of the German stock market during the 2000s, it is possible that the updated one-way transaction costs for institutional investors would have been somewhat lower during the period from May 2000 to April 2015. 11 However, because the amount of possible reduction is difficult to estimate and more recent Table 1 Sample characteristics For each of the six single portfolio-formation criteria examined, the table presents annual medians, as well as their full sample period averages (The medians of CFO/P and E/P are in percentages). The second-last column provides the corresponding statistics on the market equity values (MV) of the sample firms (in million euros). The right-most column shows the number of sample firms in each one-year subperiod, whereas the left-most column indicates the time points (as end of April each year) for which the statistics are calculated (see "Appendices 1 and 2" for the calculation principles for each characteristic. ACCR refers to accruals-to-assets, whereas MLEV refers to market leverage (i.e., total assets/market value of equity)

3
Can the FSCORE add value to anomaly-based portfolios? A reality…     Over the 15-year full sample period from May 2000 to April 2015, the table presents the annualized geometric average returns net of transaction costs (NR), volatilities (σ), SKADs (all in percentages), the Sharpe ratios (SR), SKASRs, information ratios (IR), SKAIRs, 4-factor alphas (p.a.) and the slopes for market excess return (denoted as β), SMB, HML, and WML factors, as well the adjusted R-squareds for the quantile portfolios formed on the 13 long-only portfolio-formation criteria (named on the column header line). The third column from the right shows the corresponding averages for the 12 primary criteria, whereas the right-most column shows the corresponding statistics for the market portfolio, if appropriate. The p values (in parentheses below each statistic) are in percentages (The p values for the SR differences indicate the significances for the outperformance over the stock market portfolio, whereas those for the alphas shows the significances for abnormal return over the 4-factor asset pricing model introduced in "Appendix 3" (Eq. 5). The p values of the factor slopes indicate the significances for the explanatory power of each factor. The significance of each test statistic at the 1% (5%) level is shown by * (**) on the rows above the corresponding p values. The average number of constituent stocks in each quantile portfolio (denoted as n) is shown on the bottom row of each panel. Panel A shows the statistics for the annually re-formed portfolios, whereas Panel B (C) shows the corresponding results for the portfolios formed on every third (fifth) year. ACCR refers to accruals-to-assets, whereas MLEV refers to market leverage (i.e., total assets/market value of equity) trading costs statistics indicates that this amount, if existent, is small even at its highest-particularly when averaged over the 15 years-we use the Domowitz et al. (2001) estimate in order to not underestimate the real-world implementation costs.

Test procedures for performance comparisons
The reported performance statistics for the quantile portfolios are based on six performance metrics that are the raw return, the Sharpe ratio (Sharpe 1966), the skewness-and kurtosis-adjusted Sharpe ratio (SKASR), the Carhart (1997) 4-factor alphas, the information ratio (in line with Goodwin, 1998), as well as the skewness-and kurtosis-adjusted information ratios (SKAIR). Given that for the sample data employed, all these performance measures produce relatively consistent portfolio rankings, our written analysis focuses on the two first-mentioned, being complemented with the latter four only in cases in which the status of the best-performing portfolio is different to that indicated by either raw returns or the Sharpe ratio. The formulas and the justifications for the performance metrics with definitions of the related variables are provided in "Appendix 3". The statistical significance of the reward-to-risk ratio differences between comparable pairs of portfolios is determined on the basis of the p values of the Ledoit-Wolf (2008) test, 12 which is based on the circular block bootstrap method. We also test the significance of 4-factor alphas based on their t-statistics. Throughout the study, we use the Newey-West (1987) standard errors in the regressions to avoid problems related to autocorrelation and heteroscedasticity. Table 2 shows the performance statistics of the 36 primary quantile portfolios and the 3 high-FSCORE portfolios for the 1-, 3-, and 5-year holding periods. Among the 39 comparable portfolios, the highest net return (16.94% p.a.) is generated by the low-ACCR portfolio that is re-formed every third year. The finding is somewhat surprising given the existing U.S. evidence, according to which the accrual anomaly seems to decay as the holding period is extended beyond one year (e.g., see Sloan 1996;Xie 2001;Khan 2012). To the best of our knowledge, this is also the first time when such evidence for the use of a holding period length exceeding one year for the ACCR criterion has been reported in the related literature. The finding is also     1 3

Results for the primary quantile portfolios and the plain FSCORE portfolios
Can the FSCORE add value to anomaly-based portfolios? A reality…        For the 12 FSCORE-boosted long-only portfolios named after the primary formation criterion (shown on the column header line), the performance statistics analogous to that presented in Table 2 for the comparable primary portfolios is shown. The p values below the SKASR and SKAIR statistics indicate the significance level at which a FSCOREboosted portfolio has outperformed against the comparable primary portfolio (in terms of the SR and IR differences, respectively. The significance of the corresponding Ledoit-Wolf (2008)  surprising with the respect that in their comparisons of the same 13 portfolio-formation criteria within a larger sample of German stocks, Pätäri et al. (2018c) report the lowest raw return (only 1.48% p.a. before transaction costs) among the 13 comparable portfolios for the low-ACCR criterion. The observed divergence is only partially explained by the differences in holding period lengths, as the net return for the annually re-formed low-ACCR portfolio after transaction costs is outstandingly higher in our study (i.e., 13.08%) than the previously mentioned return reported in Pätäri et al. (2018c). Therefore, the major part of the divergence must be explained by differences in market-cap thresholds and in the employed high-FSCORE thresholds. Because the return difference between these two studies is so remarkable for low-ACCR portfolios, it is obvious that the removal of microcap firms below a €50 M market-cap has an increasing impact on the return of the low-ACCR portfolio in the German stock market during the sample period. 13 With this respect, our results are parallel with earlier evidence from other stock markets: e.g., for the same accrual measure as that used in our study, Hafzalla et al. (2011) show that size-adjusted returns of the lowest-ACCR decile US stocks were not significantly positive, whereas they were significantly positive for most of those upper ACCR deciles that consisted of negative ACCR stocks. 14 The authors also reported that the average market equity was remarkably lower for the lowest-ACCR decile firms than in any other deciles that consisted of less negative ACCR stocks. These two findings give indirect evidence that the poor performance of the lowest-ACCR firms might be explained by the fact that many of the lowest-ACCR firms would most likely have been such microcaps, whose presence in the lowest-ACCR decile portfolio has been detrimental to its performance. Similar U.S. evidence is also reported by Kim and Kim (2017) who show that a small subset of firms having both highly negative accruals and cash flows worsen the performance of the lowaccrual stocks that are typically small in terms of market capitalization. Moreover, the results of Lev and Nissim (2006), according to whom extreme-accrual firms are mostly small and risky with low profitability, give further rationales for the observed performance enhancement of the low-accrual portfolio stemming from the removal of microcaps from the universe of investable stocks. A negative impact of microcaps on the performance of the lowest-ACCR is also implied by the Korean evidence introduced by Kim et al. (2015). 15 13 Because the performance enhancement from including the FSCORE criterion beside the low-ACCR criterion is so great in the results of Pätäri et al. (2018c), it is evident that the low-ACCR microcaps with an FSCORE lower than 8 are the main reason for the poor performance of the low-ACCR stand-alone portfolio in their study (According to their results, the annually re-formed FSCORE-boosted low-ACCR portfolio generated an annual return of 20.92% (before transaction costs), which is as much as 19.44 percentage points higher than the comparable return of the comparable low-ACCR stand-alone portfolio. 14 As for our German sample data, the majority of the U.S. stocks included in Hafzalla et al.'s sample from the period of 1989 to 2008 are negative ACCR stocks (On average, the proportion of negative ACCR stocks is more than 70%, thereby indicating that the seven lowest-ACCR deciles consist merely of negative ACCR stocks). 15 Although the authors do not directly test the role of microcaps in explaining the low returns of the lowest-ACCR decile portfolio, their descriptive statistics (particularly, the market equity medians reported for the decile portfolios) reveal that the proportion of the tiniest microcaps is greatest in the lowest-ACCR portfolio.

3
The highest Sharpe ratio (0.882), as well as the highest information ratio (0.976) among the 39 quantile portfolios included in Table 2 are reported for the annually re-formed stand-alone FSCORE portfolio. Its top position in terms of the Sharpe ratio is based on its low volatility, but it should be noted that its return distribution is relatively heavily skewed to the left. Consequently, the highest SKASR (0.813) is documented for the same low-ACCR portfolio that also generates the highest net return among these 39 portfolios. However, the greatest 4-factor alpha (6.68% p.a.) within the same peer group is documented for the annually re-formed low-ACCR portfolio. 16 In general, all the 39 quantile portfolios included in Table 2 generate higher net returns and Sharpe ratios than the German market portfolio. Based on the Ledoit-Wolf test, the outperformance is significant (at the 5% level) for seven of the 13 annually re-formed portfolios, and for eight of the 13 portfolios with a 3-year updating frequency. By contrast, a 5-year updating frequency generates only three significant outperformance cases for the portfolios formed using either CFO/P or the FSCORE as a stand-alone criterion or the combination of S/P and CFO/P. These three portfolios are also specific in that they all outperform the market portfolio significantly, regardless of the holding period length employed. Interestingly, B/P works poorly as a stand-alone criterion, as the three worst-performing portfolios among the 39 quantile portfolios during the sample period are all B/P portfolios with different holding period lengths.

Results for the FSCORE-boosted portfolios
The results for the FSCORE-boosted portfolios (in Table 3) clearly show a performance enhancement stemming from the inclusion of the high-FSCORE threshold alongside the primary criteria. The average net return calculated over the 36 comparable portfolios increases by 2.79% points from 12.17% p.a. documented for the portfolios formed on the 12 primary criteria alone to 14.96% p.a. for the comparable FSCORE-boosted portfolios. The comparison of the corresponding portfolios with different holding period lengths reveals that the average return enhancement caused by the FSCORE boost is strongest (4.91% points) among the annually re-formed portfolios, followed by the portfolios that are re-formed every third year, among which the average return increase is 3.22% points. By contrast, for the portfolios 16 To the best of our knowledge, our study is somewhat surprisingly the first to document a significant anomalous performance for the German long-only low-ACCR portfolio that is formed in accordance with the most commonly used accrual measure suggested by Sloan (1996), as previous studies on the accrual anomaly in the German market (e.g., LaFond 2005; Pincus et al. 2007;Kaserer and Klingler 2008;Gegenfurtner et al. 2009) have employed the cash flow approach introduced by Collins and Hribar (2000) in the calculation of accruals. In general, the literature on the accrual anomaly in Germany is surprisingly scant, particularly in the light of the evidence for high earnings manipulation among German firms (e.g., see Leuz et al. 2003;van Tendeloo and Vanstraelen 2005;Haw et al. 2012).

3
Can the FSCORE add value to anomaly-based portfolios? A reality… updated at a 5-year frequency, the average return boost stemming from the inclusion of the FSCORE threshold is only 0.23% points. In addition, their average return is lower than the corresponding averages of the portfolios re-formed annually or every third year. Thus, in terms of raw returns, the benefits of the inclusion of the FSCORE threshold alongside the primary criteria are, on average, concentrated in the 1-and 3-year holding period lengths.
In terms of the Sharpe ratios, the results are similar to those based on raw returns in the respect that the greatest average FSCORE boost is documented for the annually re-formed portfolios, followed by the portfolios re-formed every third year. In contrast with the results of the primary portfolios, for many of whom the extension of the holding period from one year to three years improves their performance, the same seldomly holds for the FSCORE-boosted portfolios: the only FSCORE-boosted case where the performance improves in terms of both raw and total risk-adjusted returns by using a 3-year holding period instead of a 1-year holding period is documented for the low-ACCR portfolio. On the other hand, this particular portfolio is the best performing among all the examined 75 portfolios based on the majority of the performance metrics employed, generating the highest net return of 20.27% p.a., the highest Sharpe ratio and SKASR (1.067 and 1.016, respectively), as well as the greatest 4-factor alpha (7.02% p.a.). Hence, our overall results show that-at least for the selection of long-only portfolios in the German universe of non-microcap stocks, and when coupled with the FSCORE-the low-ACCR criterion would have been the most efficient among the 12 comparable primary portfolio-formation criteria. 17 Over the 15-year sample period, this criterion would have been particularly successful had the portfolios been re-formed every third year.
The performance boost from the inclusion of the high-FSCORE criterion is also evident in the numbers of portfolios that significantly outperform the market portfolio. For as many as 28 of the 36 FSCORE-boosted portfolios, a significant outperformance over the market portfolio (at the 5% level) is documented, whereas the corresponding outperformance is reported for 15 of the 36 comparable primary portfolios. Interestingly, all of the FSCORE-boosted portfolios re-formed at either a 1-or 3-year frequency significantly outperform the market portfolio (at the 5% level) during the 15-year sample period. For all 12 of the annually re-formed FSCORE-boosted portfolios, the outperformance is significant even at the 1% level. By contrast, among the same group of portfolios re-formed every fifth year, only four out of 12 such portfolios outperform the market portfolio at the 5% significance level. On the other hand, it should be noted that only two out of the 12 comparable portfolios re-formed every fifth year on the basis of the primary criteria alone (in Table 2, Panel C) significantly outperformed the market portfolio at the 5% level. 17 Although the sign of accruals is also one of the nine FSCORE constituents, this is not the reason for the highest efficacy of the combination of low-ACCR and high-FSCORE criteria, because the descriptive statistics (in Table 1) indicate that for that specific constituent, all the low-tertile ACCR stocks have the same binary value (i.e., unity) throughout the sample period, thereby indicating that within the sample of low-tertile accrual stocks, the cross-firm differences in FSCOREs are not affected by accruals (A similar logic also reinforces that the role of the CFO is not overweighted when forming the FSCORE-boosted high-CFO/P portfolio). This comparison shows that the FSCORE boost also holds for the 5-year holding period length, although it is clearly stronger for the two shorter holding period lengths employed.
Altogether, the inclusion of the FSCORE threshold alongside the primary criterion improves the Sharpe ratios in 30 of the 36 comparable cases. In ten of 30 such cases, the performance improvement is also statistically significant at the 5% level. The performance enhancement is strongest among the annually re-formed portfolios, among which all of the Sharpe ratios are higher for the FSCORE-boosted portfolios than for the 12 comparable primary portfolios. Based on the Ledoit-Wolf test for the Sharpe ratio differences, seven of the 12 annually re-formed FSCORE-boosted portfolios significantly (at the 5% level) outperformed their primary counterpart portfolios. Corresponding comparisons between the portfolios re-formed at a 3-year and a 5-year frequency reveal that although the majority of such FSCORE-boosted portfolios also generate higher Sharpe ratios (compared to those of the corresponding primary portfolios) for longer holding period lengths, the performance difference is statistically significant in only three of the 12 comparable cases for a 3-year holding period length, whereas it is not significant (at the 5% level) in any of the 12 comparable cases for a 5-year holding period length. In terms of statistical significances of the information ratio differences, the evidence for the FSCORE boost is mostly similar, with only minor deviations: for a 1-year (3-year/5-year) holding period length, six (four/one) out of the 12 FSCORE-boosted portfolios significantly outperform their primary counterpart portfolios at the 5% level. Overall, the FSCORE boost is greatest for the most conventional holding period length employed in the related literature (i.e., for annually re-formed portfolios).
Among the 12 primary criteria, the strongest FSCORE return boost is reported for B/P, for which it is 5.73% p.a. based on averages over the three examined holding period lengths (calculated as the average annual return differences between the portfolios formed on each of the primary criteria and those formed by using the FSCORE as a supplementary criterion). The highest overall return increase stemming from the inclusion of the FSCORE is documented for the annually re-formed B/P portfolio that would have yielded 8.55% points (p.a.) more when supplemented with the FSCORE than without the FSCORE boost. Interestingly, the differences in magnitude of the FSCORE boost between the 12 criteria, and even between the four stand-alone value criteria, are remarkable: for example, the FSCORE boost, quantified in terms of the average incremental return calculated over the three holding period lengths, is negative for the CFO/P portfolio, whereas the B/P and E/P portfolios would have outstandingly benefited from the inclusion of the FSCORE regardless of the examined holding period length. However, from an investor's viewpoint, more decisive than the magnitude of the FSCORE boost for each primary criterion is the overall performance of the FSCORE-boosted portfolios.
It should be noted that, with the exception of the comparisons between the primary portfolios and their FSCORE-boosted counterparts, the portfolios are not equal with respect to their breadth dimension, as the number of high-FSCORE stocks at each portfolio formation point varies within the tertiles from which such stocks are picked. In comparisons of the 12 primary criteria over the 15-year sample period, the average number of stock series in these portfolios varies from 20 (for the market leverage portfolios that are updated with a 5-year frequency) to 36 (for the E/P portfolios with 1-and 5-year updating frequencies), with an average of 28. Because of a significant negative correlation between the total risks of the examined portfolios and the numbers of stock series included in them, 18 the narrower portfolios have a higher idiosyncratic risk, on average. However, as idiosyncratic risk is a component of total risk, the differences in the breadth dimension of the portfolios are at least indirectly reflected in the total risk-adjusted performance metrics, such as the Sharpe ratio and the SKASR.
To compare the performance rank order for the group of 12 FSCORE-boosted portfolios reported in Pätäri et al. (2018c) with ours, we calculate the Spearman rank correlation coefficients based on four performance measures that were employed in both of these studies (i.e., raw returns, the Sharpe ratio, the SKASR, and the Carhart 4-factor alpha). All four coefficients are insignificant (at the 5% level), but two of them (i.e., those based on SKASR and the 4-factor alpha) are simultaneously weakly significant (at the 10% level) and negative, thereby indicating reverse, rather than consistent rankings. These results show that the previously discussed dependency of relative performance ranking of the primary criteria on the employed marketcap and high-FSCORE thresholds 19 also holds for the FSCORE-boosted portfolios, although the spectrum of performance statistics is clearly narrower for the latter portfolios than it is for the portfolios formed on the primary criteria. With the respect to dependency of relative performance of FSCORE-boosted portfolios on the market capitalization of constituent stocks, our results are in line with Tikkanen and Äijö (2018), who report varying performance rankings for the pan-European FSCORE-boosted large-, mid-and small-cap portfolios formed by using five value measures (i.e., B/P, E/P, dividend yield, EBIT/EV, and EBITDA/EV) and gross profitability as the primary criteria.
Tables 2 and 3 also show the 4-factor slopes for all the examined 75 portfolios. The most conspicuous phenomenon in factor exposures is that among the three spread factors, the highest number of significant factor exposures is documented for the momentum factor, although price momentum has not been employed as a portfolio-formation criterion at all in the research design. This somewhat surprising finding is, however, explained by untypical period-specific inter-relations of the factors: As an example, the German value and momentum factors have a very significant positive correlation (t-stat 5.75), whereas during the period preceding the sample period, over which the German factor data is available 20 (i.e., from July 1958), their cross-correlation was significantly negative (at the 5% level). In addition, the momentum factor has a very significant negative correlation with the market excess return factor over the period 05/2000-04/2015, while for the preceding period, their correlation was very close to zero and insignificant. In fact, all the pairwise correlations between the explanatory factors, except that between the size and market excess return factors, are highly significant during the sample period. However, variance inflation factors (VIFs) show that the degree of multicollinearity is not high enough to invalidate the overall goodness of fit of the employed 4-factor model. 21 The joint explanatory power of the four factors is within a reasonable range, varying from 51.11% (for the stand-alone B/P portfolio with a 5-year updating frequency) to 78.37% (for the annually re-formed plain FSCORE portfolio) in terms of adjusted R-squareds. 22 Nevertheless, individual factor exposures should be interpreted with caution and therefore, the following analysis on factor exposures focuses on the groups of portfolios instead of individual portfolios.
Altogether, the factor exposures reported in Tables 2 and 3 cannot reveal any prominent source for the FSCORE boost. The most eye-catching difference in factor slopes between the 12 primary portfolios and the comparable FSCORE-boosted portfolios is documented for the SMB exposures that, on average, are positive for the first type of portfolios, while being slightly negative for the FSCORE-boosted portfolios. However, the difference is not statistically significant. Although the negative average SMB return of the sample period could partially explain the incremental net returns stemming from the FSCORE boost (given that the FSCORE-boosted portfolios would, on average, be slightly tilted towards large-caps, whereas the reverse would hold for the primary portfolios), it should be noted that in the case of negative size premium, negative (positive) SMB slopes would decrease (increase) the corresponding alphas. In spite of this, the performance difference in favor of FSCOREboosted portfolios is also remarkable in terms of the 4-factor alphas, thereby indicating that only a small portion of the documented performance difference is explained by the differences in market-cap tilts. This inference is also supported by the fact that in terms of absolute monthly average returns, the SMB factor has the lowest return among the four explanatory factors. Given that among the four factor slopes, the SMB slopes are also clearly closest to zero and mostly insignificant, 23 it is evident 21 We checked this by regressing each of the four factors on all three other explanatory factors. The highest VIF was 2.07 (in the case where the WML factor is the dependent variable), which is still clearly at tolerable level (According to statistical literature (e.g., see Hair et al. 2006), the conservative upper limit for VIF is often set to 5). We also tested the incremental contribution of the HML (WML) factor to the 4-factor model by dropping out an examined factor and then running the regressions for a reduced (i.e., 3-factor) model. Based on the F-statistics calculated by dividing the difference between the sum of squares explained by the full (i.e., 4-factor) model and the comparable sum explained by the reduced model by the mean square error of the full model, we found out that in the great majority of cases, both factors significantly increased the explanatory power of the regression model. The marginal contribution of the WML factor was particularly frequent, as the corresponding F-statistic was significant (at the 5% level) in 74 out of the 75 cases, the only exception being the annually re-formed low-ACCR primary portfolio, for which it was yet weakly significant (at the 10% level). 22 We also tested several alternative 4-factor models with similar but differently calculated factors, in the spirit of Brückner et al. (2015). With respect to cross-correlation of the alternative variants of explanatory factors, the results remained qualitatively the same. However, the average adjusted R-squared is highest for the employed factors. 23 Significant SMB slopes are reported for only seven out of the 75 examined portfolios. In all such cases, these slopes indicate positive size exposures (i.e., small-cap tilts) for some primary portfolios. In five out of the seven cases, the portfolios belong to the group of the annually reformed ones, whereas the two remaining significant cases are documented for the portfolios with a 3-year updating frequency (see Table 2 for details).

3
Can the FSCORE add value to anomaly-based portfolios? A reality… that the differences in size exposure could explain only a small part of the FSCORE boost in factor-adjusted performance comparisons.
With respect to exposures on other two spread factors, the (un-tabulated) average t-statistic calculated for the portfolios grouped on the basis of the holdingperiod lengths show that for the annually re-formed portfolios, the value spread is, on average, slightly more significant explanatory factor than the momentum spread, whereas for the less-frequently updated portfolios, the reverse holds. The shift towards stronger momentum exposure when extending the holding period length is intuitive in the sense that when the price recovery of the value stocks starts, it may continue for several years (e.g., see Lakonishok et al. 1994;Bird and Whitaker 2003;Pätäri et al. 2010). In light of the facts that the mean-reversion cycle is typically from three to five years (e.g., see De Thaler 1985, 1987;Asness et al. 2013) and most of the loser stocks are simultaneously value stocks, 24 it is not so surprising that many value stocks start to behave more and more like momentum stocks when moving further from the portfolio (re)-formation point. 25 However, it is somewhat surprising that at least for this specific sample, among the portfolios with a 5-year updating frequency, the FSCORE-boosted portfolios have stronger value tilts than their primary counterpart portfolios in 11 out of the 12 comparable cases, the only exception being the case where E/P is the primary criterion. Among the primary portfolios formed on every fifth year, only two significant value exposures are documented, whereas among the primary portfolios formed on every third year, 11 out of the 12 corresponding HML slopes are significant and positive. The observed significance loss stemming from extending the holding period length from 3 to 5 years supports the style migration behavior of stocks, particularly, as for all the 25 portfolios formed every fifth year, the momentum tilts are more significant than the corresponding value tilts.

Performance comparison in time-varying stock market conditions
A potential pitfall of the portfolios of high-FSCORE stocks is the dependence of their proportion on the state of the economy: The proportion of high-FSCORE firms in the investable universe of stocks tends to decrease during bad times, which makes the high-FSCORE portfolios less diversified in such conditions. Although this tendency is at least partially alleviated by lowering the high-FSCORE threshold from 24 As an example of global evidence for this, Asness et al. (2013) report that the returns of a stock portfolio formed from the negative of past 5-year returns of U.S., European and Japanese stocks are on average 0.86 correlated with the returns of the comparable B/P portfolio. 25 According to earlier literature on style migration, the change of style characteristics of stocks over time partially explain the value premium, when some of previous value stocks gradually turn to growth stocks and vice versa (e.g., see Fama and French 2007a,b;Broussard et al. 2016). Because growth stocks are very often momentum stocks as well, the migration effect may also explain why value stocks may get momentum-like characteristics during longer holding periods (For the sample period employed, the returns of the short (i.e. growth) leg of the German value spread are highly correlated with the returns of the long leg of the corresponding momentum spread. On value-weighted (equally-weighted) basis, their correlation is 0.75 (0.82) with a t-stat of 15.32 (19.10)). 8 to 7, it is worthwhile to examine what kind of impact this tendency has on the relative performance of the examined portfolios during time-varying stock market conditions. To find this out, we first divide the full-length sample period into bull and bear market periods according to the turning points of the German stock market. Analogous to Edwards et al. (2003), we use a 20% cumulative return (loss) of the market portfolio from the previous trough (peak) to the subsequent peak (trough) in the demarcation of bullish (bearish) periods (see Table 4 for details). Based on the DAX100 index, four separate bullish and bearish periods within the 15-year sample period are identified. Panels A and B in Table 4 show the average monthly marketadjusted returns over the pooled bearish and bullish periods, respectively (The market-adjustment is made by subtracting the equal-weighted average cross-sectional monthly return of all sample stocks from each monthly portfolio return).
The average market-adjusted returns are clearly higher during bearish periods than the corresponding returns in bullish periods, as the average market-adjusted returns (calculated over all 75 of the portfolios examined) for bearish and bullish periods are 1.54% and 0.20% p.m., respectively. A similar type of return generation pattern holds for all the holding period lengths employed. However, the average market-adjusted return during bullish periods is close to zero for the portfolios with a 5-year updating frequency, whereas the corresponding average calculated for the 25 annually re-formed portfolios is 0.38% p.m. For the portfolios updated annually or every third year, excess returns (over the market return) are, on average, higher in both bearish and bullish conditions for the FSCORE-boosted portfolios than for the corresponding primary portfolios. By contrast, for the 5-year holding-period portfolios, the average added value of the FSCORE boost is negative during bullish periods, while being clearly positive in bearish conditions.
The greatest average FSCORE boost in terms of market-adjusted returns is reported for the annually re-formed portfolios in bearish periods, during which the average return difference between the 12 primary portfolios and the comparable FSCORE-boosted portfolios is as high as 0.91% p.m. in favor of the latter. In the same portfolio comparisons, the market-adjusted returns are higher for the FSCORE-boosted portfolios in all 12 comparable cases. These findings indicate that the use of the FSCORE as a supplementary selection criterion alongside a primary criterion provides outstanding downside protection against declining stock markets, particularly among annually re-formed portfolios. Nevertheless, in terms of (untabulated) absolute returns, their average bearish-period return (− 1.66% p.m.) is expectedly clearly negative, yet considerably higher than the comparable return of the market portfolio (i.e., -3.57%). However, none of the 75 portfolios examined succeeds in avoiding losses in bearish conditions, whereas during bullish markets, all of them are profitable, although quite many of them do not achieve the level of market return in such conditions. Among the 75 portfolios examined, we document 17 such cases, of which 13 are portfolios with a 5-year updating frequency. In bullish conditions, the FSCORE boost is, on average, greatest for the portfolios updated every third year, although the boost is nowhere near to that reported for the annually re-formed portfolios during bearish periods.
To control for the impact of bullish (bearish) months within bearish (bullish) periods on the results presented above, we also split the full-length sample period into bullish months and bearish months based on the signs of stock market average returns in each month, similar to Fuller and Goldstein (2011). This demarcation criterion identifies 107 (73) months with positive (negative) returns. The results of this analysis are in line with the above-reported bullish-and bearish-period results and provide even stronger evidence for the downside protection provided by the FSCORE-boosted portfolios (see Panels C and D in Table 4). Moreover, the statistics reported for the stand-alone high-FSCORE portfolios in the right-most column of Table 4 show that their outperformance over the market portfolio is mostly driven by their tendency to decline far less in bearish conditions than the market portfolio, similar to the case of the FSCORE-boosted portfolios. In this respect, the high-FSCORE stocks behave similarly to value stocks but contradictory to high pricemomentum stocks that typically underperform in bearish conditions and outperform during bullish periods (e.g., see Pätäri et al. 2018a). In light of the findings according to which fundamental momentum and price momentum are more or less interrelated (e.g., see Chordia and Shivakumar 2006;Chen et al. 2014), the defensive characteristics of high-FSCORE stocks can be deemed somewhat surprising. On the other hand, Ahmed and Safdar's (2018) recent results imply that rather than overlapping, fundamental and price momentums are complementary. Figure 1 illustrates the return accumulation of the market portfolio and certain trading portfolios throughout the 15-year sample period. The portfolios are chosen on the basis of their performance statistics: The two FSCORE-boosted low-ACCR portfolios are included as the best overall performers among the examined 75 portfolios. 26 The 3-year stand-alone low-ACCR portfolio is the best primary portfolio in terms of net returns, whereas the annually reformed plain FSCORE portfolio has the best overall risk-adjusted performance statistics among the 39 portfolios included in Table 2. The 5-year plain FSCORE portfolio is included as the lowest-volatility portfolio, whereas the annually re-formed FSCORE-boosted B/P portfolio is the lowest-risk portfolio in terms of SKAD. Figure 1 shows that the relative return accumulation of these portfolios has remained rather stable throughout the sample period. The most notable exception to this general tendency is the return accumulation of the 3-year stand-alone low-ACCR portfolio, the value of which rose very rapidly after the end of global financial crisis period. By contrast, during the last two years of the sample period, this stand-alone portfolio has gained nowhere near to what is documented for the two FSCORE-boosted low-ACCR portfolios included in the graph.

Crash risk exposure
To find out whether the FSCORE also affects the portfolios' crash risk exposures, we calculate the maximum drawdown (MDD) statistics for all 75 of the portfolios examined based on their month-end values. Similar to Choi et al. (2015), MDD  Can the FSCORE add value to anomaly-based portfolios? A reality…  is defined as the maximum percentage loss over any subinterval of the evaluation period. Table 5 shows that MDD statistics are consistent with the return generation patterns reported for bearish periods (bearish months) with respect to the strongest impact of the FSCORE boost on average MDDs clearly being documented for the annually re-formed portfolios, among which the average MDD decreases from − 52.42 to − 45.31% as a consequence of including the high-FSCORE criterion alongside the primary criteria. In the same portfolio comparisons, the MDDs are lower for the FSCORE-boosted portfolios in all 12 comparable cases, thereby indicating that among the annually re-formed portfolios, the adoption of the high-FSCORE criterion has remarkably decreased crash risk exposure. By contrast, and on average, the same does not hold for the portfolios re-formed every third or fifth year, since a similar decrease in MDDs stemming from the inclusion of FSCORE as reported for the annually re-formed portfolios is not observed for longer holding-period portfolios. As a matter of fact, the MDD for the 3-year holding-period primary portfolios is even slightly lower than it is for the corresponding FSCORE-boosted portfolios in spite of the fact that in the comparisons of stand-alone high-FSCORE portfolios, the MDD for the annually re-formed portfolio is marginally higher than it is for the comparable portfolio updated every third year. Consistent with our earlier evidence, the results based on 5-year holding period length are also worse with respect to MDDs than the corresponding results for the 1-and 3-year holding period lengths. Based on the risk and performance statistics reported in Tables 2, 3, 4 and 5, the outperformance of the FSCORE-boosted portfolios over the market portfolio, as well as over the comparable primary portfolios is not driven by their higher risk. Quite the contrary, the FSCORE boost seems to decrease portfolio risks, regardless of how these risks are quantified. This risk reduction characteristic stemming from the inclusion of a high-FSCORE threshold as a supplementary selection criterion is at its strongest among annually re-formed portfolios. Consistent with the results of Piotroski and So (2012), Duong et al. (2014), andBroussard et al. (2016), our overall results support the mispricing-based explanation as the reason for the observed outperformance of the high-FSCORE-related portfolios (including both stand-alone FSCORE and FSCORE-boosted portfolios).

Comparison of hit rates
For each of the 75 quantile portfolios, we calculate the hit rates that indicate the average proportion of stocks whose returns have been higher than the average stock market return, following Piotroski (2000), Bird and Casavecchia (2007), and Pätäri et al. (2018c). Table 6 shows that for all of the 12 primary portfolio-formation criteria, the inclusion of the high-FSCORE criterion enhances the hit rates when a 1-year holding period is employed. The same also holds for the 3-and 5-year holding periods, except in the case of the CFO/P portfolio formed with a 5-year updating frequency and in the cases of two composite value portfolios formed on the combinations of B/P and E/P, and S/P and CFO/P using a 3-year holding period length. On average, the hit rate improvement (by 5.70% points) stemming from the inclusion of the FSCORE threshold is greatest for annually re-formed portfolios, but it also remains remarkable for the 3-and 5-year holding periods. The highest individual average hit rate (61.38%) is reported for the FSCORE-boosted low-ACCR portfolio with a 3-year re-formation frequency, which also generates the highest raw return, Sharpe ratio and SKASR, as well as the greatest 4-factor alpha among all of the 75 portfolios examined. Although the hit rates cannot be interpreted as measures of performance, the correlation of the hit rate ranking of the 75 portfolios examined is statistically significant with all six corresponding performance rankings, being highly significant even at its worst. 27 Overall, the average hit rates calculated for each holding period length are higher for both the FSCORE-boosted portfolios and the primary portfolios than the corresponding rates reported for similarly defined annually re-formed portfolios in the larger market-cap universe of German stocks in Pätäri et al. (2018c). The difference in hit rates in favor of our non-microcap sample is particularly great between the primary portfolios, among which the average hit rate for annually re-formed portfolios is 51.56%, whereas it is 47.08% according to the results of Pätäri et al. (2018c). This finding indicates that at least for the German sample data, the joint effect of removing microcaps from the investable universe of stocks and determining the number of stock series in quantile portfolios based on an FSCORE threshold of 7 instead of 8 improves the average hit rates.
One possible reason for the outstandingly higher hit rates in these two studies, when compared to those reported in earlier studies, is that the latter studies report the hit rates for such primary portfolios that have consisted of a much higher proportion of the investable stock universe, 28 whereas in the two first-mentioned studies, the number of stock series in primary quantile portfolios is limited to correspond to that of the comparable FSCORE-boosted portfolio. In this sense, the hit rate increase stemming from the inclusion of the high-FSCORE criterion documented in Pätäri et al. (2018c) and in this study provide more robust evidence for the favorable rightward shift of return distributions than earlier hit rate comparisons, as in these two studies, the possibility of the hit rate increase stemming from the fact that the primary portfolios are much larger in terms of the number of stock series included than the FSCORE-boosted portfolios to which they are compared is ruled out.
The hit rate statistics for plain FSCORE portfolios support the findings of the preceding performance analysis by showing that the good performance of the plain FSCORE portfolios is not based on such return-generation pattern in which the outperformance of the minority of high-FSCORE stocks would have overcompensated the underperformance of the majority of the same group of stocks. Somewhat surprisingly, the highest average hit rate (58.95%) for FSCORE portfolios is reported by using a 5-year updating frequency, which indicates that the FSCORE is also feasible as a long-horizon selection criterion. However, the performance statistics reported in Table 2 reveal that the higher average hit rate of the FSCORE portfolio updated with a 5-year frequency (compared to the hit rates of more frequently re-formed FSCORE portfolios) does not necessarily imply a better portfolio performance: based on five of the six performance measures employed, the statistics are better for more frequently re-formed FSCORE portfolios than for the FSCORE portfolio updated with a 5-year frequency. However, the performance differences between the three plain FSCORE portfolios are not very dramatic in terms of any of the performance measures employed. At least based on this specific sample data, the aboveaverage performance of high-FSCORE stocks seems to persist longer than has thus far been documented in the related literature. Interestingly, the relative performance of a plain FSCORE portfolio among the portfolios of equal holding period lengths is strongest among the portfolios with a 5-year updating frequency to the extent that its overall performance statistics are the best among all 25 comparable portfolios, including the FSCORE-boosted portfolios.

Impact of transaction costs on the relative performance of the portfolios
If transaction costs are omitted, the results for the portfolios updated at the same frequency remain qualitatively the same for all the examined holding-period lengths because the average turnover rates of such portfolios are quite close to one another.

3
Can the FSCORE add value to anomaly-based portfolios? A reality… Table 6 The average proportions of outperforming stocks in quantile portfolios For each of the 75 long-only quantile portfolios examined, the table shows the average proportions (in percentages calculated over the 15-year sample period) of stocks whose returns have been higher than that of the average stock market return. The hit rates for 1-, 3-, and 5-year holding periods are shown in Panels A, B, and C, respectively. In each panel, the upper row shows the hit rates for the 12 primary portfolios, as well as for a stand-alone high-FSCORE portfolio (in the right-most column Although the turnover rates are, on average, marginally higher for the annually reformed FSCORE-boosted portfolios than for the corresponding primary portfolios, the difference may get partially or fully compensated by lower average trading costs of the former type of portfolios-stemming from the fact that indirect price impact costs are higher in smaller-cap trades 29 (Based on SMB slopes, the inclusion of FSCORE besides a primary criterion tends to increase the average market cap of portfolio firms, thereby reducing the price impact from implementing trades). For the longer holding periods, the differences in average turnover rates as well as in SMB slopes are so small that the average transaction costs are practically the same for the primary and the corresponding FSCORE-boosted portfolios. However, the annual rebalancing of portfolios is clearly more costly than less frequent rebalancing, as the average cumulative transaction costs of the annually re-formed portfolios are approximately three (five) times as high as they are for the portfolios re-formed at a 3-year (5-year) frequency, thereby indicating that the average portfolio turnover rates are relatively stable regardless of whether the re-formation is done annually or every third (fifth) year.
The overall low turnover rates of the examined portfolios also imply that the results would not radically change if higher, yet reasonable, transaction cost estimates were employed-although the performance comparisons between the portfolios updated with different frequencies would in such a case be somewhat more favorable to the less frequently re-formed portfolios. By setting the market-cap threshold for the sample firms at €50 M, we have aimed to exclude the possibility of a radical estimation error in transaction costs, as high relative transaction costs are mostly related to trading with microcap stocks (e.g., see Lesmond et al. 1999). Therefore, we believe that our main conclusions are robust to the level of the transaction-cost estimate, given that it is within a reasonable range. Based on size factor (SMB) exposures calculated on the basis of the 4-factor model employed, the examined portfolios do not differ radically in that some portfolios would have been heavily tilted toward large-cap stocks, whereas others would have had a completely reverse market-cap tilt (As shown in Tables 2 and 3, only seven statistically significant SMB slopes are documented. All these slopes are positive and reported for the portfolios formed on a primary criterion).

Conclusions and implications for further research
This paper examines the added value of using financial statement information for equity portfolio selection in the German stock market-particularly the added value of Piotroski's FSCORE-in a functionally realistic research setting where potential biases stemming from the omission of transaction costs, the microcap effect, and poor diversification are controlled. During the sample period from May 2000 to April 2015, when the average stock market return was exceptionally low in Germany, the performance of annually re-formed long-only portfolios formed on 12 fundamental-based portfolio-formation criteria would have, without exception, improved as a result of the inclusion of the high-FSCORE threshold as a supplementary criterion. We also contribute to the existing literature by showing that in spite of the FSCORE boost being strongest for the 1-year holding period length, the resulting performance enhancement also holds for the 3-year updating frequency. To the best of our knowledge, our study is the first to show that the FSCORE boost extends to 3-year horizons. The use of a 3-year holding period is particularly beneficial for the low-accrual portfolio that, when supplemented with the high-FSCORE threshold, generates the best overall performance among all 75 portfolios examined.
The decomposition of the full-sample-period returns into separate bull-and bearperiod returns reveals that the better performance of FSCORE-related portfolios is mostly attributable to their tendency to decline far less in bearish conditions than the primary portfolios do and the market portfolio does, in particular. On average, the FSCORE-boosted portfolios that are re-formed annually or every third year also generate higher returns in bullish conditions than the primary portfolios and the market portfolio, although the comparable monthly return differences in favor of the FSCORE-boosted portfolios are remarkably smaller in bullish than in bearish conditions. Their defensive characteristic is also confirmed by the maximum drawdown statistics, the average of which is clearly smallest for the annually re-formed FSCORE-boosted portfolios, thereby indicating that for the shortest holding period length examined in this paper, the inclusion of the FSCORE threshold alongside a primary criterion also decreases the crash risk exposure of the related portfolios.
Our results also lend support to recent international evidence according to which investors could gain from lowering the high-FSCORE threshold level from 8 to 7 (e.g., see Choi and Sias 2012;Piotroski and So 2012;Hyde 2018). Such an extension is particularly important in order to avoid the insufficient diversification of the FSCORE-related portfolios in times when the total number of stocks with FSCOREs of 8 and 9 is low. The extension also enables the market-cap threshold to be increased so that the FSCORE boost could also be materialized by other than just small-scale investors. Our results also confirm recent findings that the FSCORE boost is not limited to high-B/P stocks (e.g., see Choi and Sias 2012;Piotroski and So 2012;Duong et al. 2014;Tikkanen and Äijö 2018), but it also extends to value stocks chosen on the basis of any of the examined valuation ratios, as well as to low-accrual stocks. The same also holds for the portfolios formed on several combination criteria. In addition, we show that the FSCORE is also efficient as a standalone criterion for selecting a long-only equity portfolio, as based on almost all riskadjusted performance comparisons of the portfolios of equal holding period lengths, high-FSCORE portfolios outperformed all of the portfolios formed on the 12 primary criteria during the 15-year sample period from 2000 to 2015.
Motivated by recent strong evidence for the FSCORE criterion on a stand-alone basis (e.g., see Choi and Sias 2012;Duong et al. 2014;Ng and Shen 2020;Walkshäusl 2020), it might also be worthwhile for future studies to revise the research design so that the FSCORE is the primary criterion, being possibly supplemented by other selection criteria. In addition, the methods employed in this paper could be applied to other stock markets to uncover the extent to which our results based on the German sample data are generalizable.

Appendix 2: Calculation principles for the portfolio-formation criteria
The principles followed in the calculation of the components for the portfolio-formation criteria are as follows: • Market Value of Equity (P): the stock price(s) multiplied by shares outstanding as end of April of year t.

3
where r i is the annualized geometric mean return of portfolio i, r f is the corresponding risk-free rate of return, and i is the annualized standard deviation of the monthly excess returns of portfolio i.

Skewness-and kurtosis-adjusted Sharpe ratio (SKASR)
The SKASRs are calculated to reveal the extent to which the total risk-adjusted performance rankings change if a risk proxy employed in the standard Sharpe ratio is adjusted to also take into account the third and fourth moments of the return distributions of trading portfolios. 32 The formula for the SKASR is as follows: where SKAD i is the skewness-and kurtosis-adjusted standard deviation of the monthly excess returns of portfolio i. The SKAD captures the third and fourth moments of the return distributions being analysed. Based on the fourth-order Cornish-Fisher (1938) expansion, the adjusted Z value (i.e., Z CF ) that corresponds to the Z value of normal distribution is first calculated as follows: where Z c is the critical value of the probability based on standard normal distribution, S refers to Fisher's skewness, and K refers to the excess kurtosis of the return distribution. Next, we calculate SKAD by multiplying the standard deviation by the ratio Z CF /Z c . Consistent with Favre and Galéano (2002), we set Z c to -1.96 to correspond to a 95% probability level when determining this ratio.

Information ratio and SKAIR
The information ratio is defined in several alternative ways in financial literature (e.g., see Goodwin 1998, for an overview of such definitions). In this paper, we calculate it in its simplest form by dividing the portfolio's mean excess return relative to its benchmark portfolio by the volatility of that excess return: The inclusion of higher moments of return distributions in the performance evaluation of equity portfolios is motivated by several recent studies (e.g., Zakamouline and Koekebakker 2009;Homm andPigorsch 2012, León andMoreno 2017). The reader should note that if an excess return distribution being analysed was strictly normal, both types of Sharpe ratio, as well as the corresponding risk metrics, would be exactly equal. Under the same assumption, the same also holds for both types of information ratio and for the related risk metrics).
where r b is the annualized geometric mean return of the benchmark portfolio, and ER is the annualized standard deviation of the monthly excess returns (over the benchmark portfolio) of portfolio i ( ER is also known as tracking error).
Because the information ratio is calculated analogously to the Sharpe ratio, with the only difference being in the definition of excess returns, it is also subject to a similar critique. Therefore, we also calculate the skewness-and kurtosis-adjusted information ratios (henceforth SKAIRs) for each of the examined trading portfolios similarly to the methodology employed for deriving the SKASRs based on the standard Sharpe ratios and the skewness and kurtosis statistics of the excess return distributions being evaluated.

4-Factor alpha
To evaluate whether the potential abnormal returns are explained by four commonly used explanatory factors, we also calculate the 4-factor alphas for each quantile portfolio based on the following regression introduced by Carhart (1997): where r it is the return of a portfolio, r ft is the risk-free rate of return, α i is the 4-factor alpha (the abnormal return over and above what might be expected based on the 4-factor model employed), r mt is the stock market return, SMB t is the return of the size factor (i.e., the return difference between small-and large-cap portfolios), HML t is the return of the B/P factor (i.e., the return difference between high-and low-B/P portfolios), WML t is the return of the momentum factor (i.e., the return difference between winner and loser stock portfolios), b i , s i , h i , and m i are factor exposures to the stock market, SMB, HML and WML factors, respectively, and ε i is the residual term.
Because of the weighting system employed in the portfolio-formation, we use equal-weighted returns of all sample stocks as a proxy for the market return. The other factors are also calculated on the basis of equal-weighted returns, 33 otherwise following the methodology employed by Fama and French (1993) for the construction of the SMB and HML factors, and that of Fama and French (2012) for the construction of the WML factor.
(5) r it − r ft = i + i r mt − r ft + s i SMB t + h i HML t + m i WML t + it 33 After careful consideration of various factor sets, we find that these types of factors produce the highest adjusted R-squareds as well as the lowest alphas for the portfolios being examined (see Brückner et al. (2015) for an analytic comparison of alternative German factor sets). Of the two alternative quantile return time-series for each factor-based quantile-division criterion calculated with and without the corporate tax credit (provided by Stehle et al. 2016), we chose the latter, since the incremental return stemming from the stock owners' compensation of the corporate tax that the firm has paid for dividend payouts is not taken into account in the total return calculations of equities in Datastream. In addition, such an imputation credit was only available for the German shareholders for dividends paid by the German companies until the system was discontinued in October 2000 (e.g., see Stehle and Schmidt (2015) for details). The quantile time series employed in the calculation of the factor returns are downloadable at: https:// www. wiwi. hu-berlin. de/ de/ profe ssuren/ bwl/ bb/ data/ fama-french-facto rs-germa ny/ fama-frenchfacto rs-for-germa ny.