Threshold-dependent tax enforcement and the size distribution of firms: evidence from Germany

This paper investigates firms’ responses to threshold-dependent intensity of tax enforcement. We use administrative tax return data over the entire population of German firms and exploit industry variation in firm size thresholds applied by the tax administration. In our setting, each threshold marks a considerable spike in audit intensity and hence should create strong incentives to bunch below the threshold. However, we find no such effect in our large sample analysis. We attribute this empirical observation to optimization costs, particularly to the costs associated with the operational implementation of size management and to information costs. Our paper adds to the emerging field of studies on potential distortions created by threshold-dependent firm regulation. The findings are also relevant for policymakers, as they suggest that the specific design of threshold-dependent policies might allow governments to increase the efficiency of tax audits without distorting the firm size distribution.


Introduction
Large firms are subject to higher audit intensity from tax administrations than small firms (Bachas et al., 2019) because governments segment taxpayers by firm size in order to increase the efficiency of tax audits. Naturally, a tax audit is costly for the 1 3 Threshold-dependent tax enforcement and the size distribution… firm, as the handling of the auditor creates compliance costs and any tax audit creates a nonzero probability of additional tax claims, interest payments and penalty fees. Hence, firms have incentives to avoid greater audit intensity. When audit intensity depends on firm size, firms have reason to strategically bunch below firm size thresholds (FSTs) through size management. Respective FSTs are often made publicly available by tax administrations. However, size management distorts the firm size distribution and has negative effects on welfare. Specifically, due to the firms' costs of size management, size management results in a deadweight loss, reduces firm's future economic performance and, consequently, overall economic growth and also decreases aggregate productivity because of inefficient resource allocation.
Despite the adverse effects that result from size management, research on this subject is scarce. To our knowledge, only two studies analyze how firms respond to threshold-dependent tax enforcement on the microlevel. First, Almunia and Lopez-Rodriguez (2018) find significant downward size management by Spanish firms and present evidence that underreporting of revenue is the key channel for this phenomenon in their setting. Second, Tennant and Tracey (2019) examine a thresholddependent policy in Jamaica. In contradiction to the results by Almunia and Lopez-Rodriguez (2018), Tennant and Tracey (2019) find no size management around the FST.
However, prior research has shown size management in many areas of taxation other than tax enforcement. For instance, Hoopes et al. (2018) analyze the responses of Australian firms to the threshold-dependent intensity of tax return disclosure. They find that firms manage their size to avoid disclosure. Further research has shown that size management at FSTs is related to kinks in corporate income tax (CIT) (Brockmeyer, 2014;Devereux et al., 2014), CIT notches (Bachas and Soto, 2021), CIT benefits (Hosono et al., 2018) and special CIT regimes for SMEs (Agostini et al., 2018), minimum CIT schemes (Best et al., 2015) and exemptions in value added tax (VAT) (Harju et al., 2016;Liu et al., 2019;Onji, 2009). Moreover, size management has also been found in an array of nontax areas, e.g., mandatory IFRS reporting (Asatryan and Peichl, 2017), financial audit and disclosure requirements (Bernard et al., 2018) and labor regulation (Garicano et al., 2016).
We use administrative microlevel tax return data to study size management for the entire population of German firms. These firms face threshold-dependent discontinuities in audit intensity. Specifically, the German tax administration segments firms into four size classes based on FSTs: very small (VS-class), small (S-class), medium (M-class) and large (L-class). Firms are assigned to a particular size class if their size exceeds either the respective FST for profit or for revenue (or both). The FSTs vary between industries (and increase continuously over time). Audit intensity between size classes varies most notably in terms of audit probability. For instance, in 2010, 21.1% of firms assigned to the L-class were audited as opposed to only 6.9% of firms in the M-class. In the S-class and the VS-class, the audit probabilities were only 3.5% and 1%, respectively (German Federal Ministry of Finance, 2011). Both the FSTs and the corresponding audit probabilities are regularly published online on the website of the Federal Ministry of Finance and in the Federal Gazette. In addition to audit probability, audit intensity between size classes also varies in terms of audit quality. First, administrative regulation dictates that for L-class firms, the audit must be consecutive, i.e., once an audit occurs, it must cover all years that were not covered by the previous audit for that firm. In contrast, for M-class, S-class and VS-class firms, the audit period cannot exceed three calendar years. Second, the skill level of the auditor and the specialization level of audit teams increase systematically with the size class.
Our results imply an absence of size management in our data. Although, naturally, the null of no size management cannot be proven, a type II error is unlikely in our setting. First, our dataset is large, with approximately 2.7 million firms included. This substantially reduces the probability of making a type II error, even in our most granular subsample analysis, in which we search for size management in individual industries. Furthermore, as we rely on administrative data, we arguably face negligible measurement error and no selection bias. Finally, the results do not seem to be driven by our specific empirical strategy, as we obtain structurally equivalent results when applying an array of alternative tests.
We make a contribution to the emerging field of studies on potential distortions created by threshold-dependent firm regulation in showing that firms in our setting do not react to FSTs by size management despite strong incentives to the contrary. We posit that the absence of size management results from optimization costs in the form of adjustment costs and information costs. The results we find for Germany contradict the results found by Almunia and Lopez-Rodriguez (2018) for Spain despite both countries being relatively similar in relevant drivers of optimization costs. Specifically, the two are similarly developed countries located in Western Europe, do not differ substantially in terms of the level of trust in public institutions and have similar tax rates. Despite these similarities, Germany and Spain differ in the specific design of their threshold-dependent enforcement regime. We argue that Germany's specific implementation of multiple criteria for segmentation, multiple size classes, regular adjustments of FSTs and industry-specific FSTs increase optimization costs and, hence, can inhibit taxinduced size management. Moreover, the results by Tennant and Tracey (2019) on firms in Jamaica, where FSTs are based on a combination of taxes paid and revenue, indicate that a more multilayered threshold-dependent policy improves firms' tax compliance as measured by both reported profitability and effective tax rates without causing tax-induced size management.
Overall, this field of research is relevant for policymakers, as the results suggest that the specific design of threshold-dependent policies might allow governments to increase the efficiency of tax audits while not distorting the firm size distribution and, hence, avoid the negative effects of size management on welfare.
The remainder of this paper is organized as follows. Section 2 outlines the effects of threshold-dependent tax enforcement and the rationale of tax-induced size management. Section 3 provides information on the German tax enforcement regime. Section 4 develops our hypotheses, and Sect. 5 describes the empirical strategy. Section 6 presents information on data and on sample selection. Section 7 provides the main empirical results as well as a discussion of them. Section 8 contains robustness tests, and Sect. 9 concludes.

3
Threshold-dependent tax enforcement and the size distribution… 2 Literature and theoretical discussion

Effects of threshold-dependent tax enforcement
Governments worldwide focus their audit resources on large business taxpayers. Specifically, approximately 85% of the world's 60 largest economies segment firms into size classes based on FSTs and apply higher audit intensities to firms in the upper size classes (OECD, 2015). The major reason for the establishment of threshold-dependent policies is that they are believed to improve the efficiency of tax audits and preserve audit budgets. Furthermore, these policies aim to secure the integrity of the tax system, as larger taxpayers bear higher compliance risks than do smaller taxpayers (OECD, 2017). Operationally, most tax administrations differentiate between two size classes, and the FSTs applied to segment taxpayers are usually based on revenue, profit, total assets, taxes paid, the number of employees or a combination of these factors. The majority of tax administrations make respective FSTs publicly available. 1 On average, in countries that rely on threshold-dependent enforcement, firms exposed to the highest level of audit intensity provide 35-50% of the total tax revenue collected while representing less than 10% of all active firms (OECD, 2017). Focusing audit resources on a relatively small number of large firms appears efficient. There is also ample empirical evidence suggesting that tax compliance increases with audit intensity (Alm, 2019). 2 However, as shown by Alm et al. (2009), higher audit intensity has a positive impact on compliance only if taxpayers are well informed that they face a higher audit intensity. Hence, publicly available information about FSTs jointly with the respective historical audit rates, as an indicator for audit probability, can have positive effects on compliance.
As tax audits usually cause substantial costs for the audited firm, public information about FST levels may also trigger a size management response. Specifically, if firms above an FST face a significantly higher audit intensity than firms located below this FST, threshold-dependent enforcement policies create incentives to manage size below the FST. However, size management distorts the firm size distribution and has negative effects on welfare for several reasons. First, the firms' costs of size management represent an allocative inefficiency and thus result in a deadweight loss (Almunia and Lopez-Rodriguez, 2018). Second, as firms that manage their size in one period will also manage their size in future periods, size management has negative effects on firms' future economic performance (Roychowdhury, 2006) and, consequently, overall economic growth. Third and finally, size management also results in inefficient resource allocation and decreases aggregate productivity (Harju et al., 2016).
Despite these negative effects on firms, research on this subject is scarce. On the macroeconomic level, Vehorn (2011) analyzes the impact of threshold-dependent tax enforcement policies in developing economies. The results show that 43% of countries experienced a decline in tax revenue (standardized by GDP) after the implementation of such policies, indicating adverse effects of threshold-dependent enforcement policies. On the microeconomic level, two studies analyze how firms respond to threshold-dependent tax enforcement. Both studies specifically investigate so-called large taxpayer units (LTUs), which are responsible for monitoring larger taxpayers. Firms are selected for LTU treatment when their size exceeds specific FSTs. First, Almunia and Lopez-Rodriguez (2018) find significant downward size management by Spanish firms at the revenue-based FST. Their results indicate that size management in their setting is predominantly conducted by underreporting rather than decreasing real activity. The results also indicate that the extent of tax-induced size management varies between industries conditional on the traceability of transactions due to third-party reporting. Traceability naturally determines the effectiveness of tax audits. Second, Tennant and Tracey (2019) examine an LTU policy in Jamaica, where FSTs are based on a combination of taxes paid and revenue. Their results indicate that LTU treatment significantly improves firms' tax compliance as measured by both reported profitability and effective tax rates. Contrary to the results by Almunia and Lopez-Rodriguez (2018), Tennant and Tracey (2019) find no size management at the FSTs. Overall, despite the widespread adoption of threshold-dependent enforcement regimes, the effects of such policies remain unclear.

Rationale
We define tax-induced size management as any activity undertaken to manage firm size below an FST in order to reduce the firm's audit intensity, regardless of whether this activity is legal or illegal. Consistent with prior literature on notches in the tax system, e.g., Kanbur and Keen (2014), we argue that three nonmutually exclusive groups of size management strategies exist. First, firms can genuinely reduce their size by decreasing their real activity (also referred to as real production response). Second, firms can report a smaller size by using available discretion in accounting rules. For instance, firms can defer the recognition of revenue, create accruals or use special depreciations. Alternatively, firms can also split their operations into two or more individual legal entities (also referred to as tax-motivated splitting by Slemrod (2016)). Third, firms can report a smaller size by misreporting, e.g., by underreporting revenue or overreporting the cost of goods sold.
Regardless of the specific size management strategy, profit-maximizing firms engage in size management only as far as the benefits of size management exceed the resulting costs of size management (hereinafter referred to as optimization costs). The most notable benefit of size management is the decrease in expected costs from audits (hereinafter referred to as expected firm audit costs) when comparing the two scenarios of firms just below and just above the FST. Consequently, if optimization costs exceed the decrease in expected firm audit costs around the FST, the thresholddependent enforcement regime is not expected to distort the firm size distribution.

Expected firm audit costs
Expected firm audit costs can be defined as the costs that arise once a firm is audited (hereinafter referred to as conditional firm audit costs) multiplied by the probability that an audit occurs. Conditional firm audit costs represent a part of firms' total tax costs and consist of additional tax claims, interest payments, penalty fees and compliance costs. 3 The first three elements are naturally conditional on detection and vary substantially in the cross section. As an example, variation between industries is conditional on the traceability of transactions under third-party reporting and hence conditional on the expected effectiveness of tax audits (Almunia and Lopez-Rodriguez, 2018). In contrast, considering the last element, compliance costs occur even if a firm is fully compliant. Compliance costs include costs of tax consulting services and administrative costs, i.e., the costs of employee resources allocated to the audit. 4

Optimization costs
Optimization costs in the context of tax enforcement can be divided into adjustment costs and information costs. Whereas adjustment costs refer to the costs of operationally implementing size management, e.g., resource costs and opportunity costs of size management (Almunia and Lopez-Rodriguez, 2018), information costs result from gathering relevant information on the tax system, particularly on the thresholddependent enforcement regime.
Adjustment costs are conditional on the criteria applied for segmentation. Specifically, size management in general is relatively difficult, as firms face uncertainty with respect to business outcomes throughout the year. However, while profit can often be adjusted through additional expenditures at the "last minute" when uncertainty declines by the end of the year (Asatryan et al., 2018), revenue, for example, is much more difficult to adjust. 5 Correspondingly, revenue is applied for segmentation in approximately 70% of the countries that rely on threshold-dependent enforcement (OECD, 2017). Additionally, if multiple criteria have to be taken into account, size management becomes considerably more difficult and more time-consuming, which consequently increases adjustment costs. Threshold-dependent enforcement based on multiple criteria is applied, e.g., in Denmark, Sweden, Germany, Turkey, Russia, Brazil and India (OECD, 2015).
Furthermore, adjustment costs vary in the cross section due to firm-specific heterogeneity. Specifically, as the costs of operationally implementing size management are mostly variable costs (Almunia and Lopez-Rodriguez, 2018), adjustment costs are conditional on the amount by which true, i.e., unmanaged, firm size exceeds the FST. Additional firm-specific heterogeneity in adjustment costs results from internal coordination costs and the quality of a firm's internal information environment (Gallemore and Labro, 2015). Moreover, the level of trust in public institutions affects adjustment costs via social norms. Specifically, a high level of trust in public institutions affects social norms by reducing the willingness of employees to become involved in presumably illegitimate activities (Alm, 2019). Since size management is likely considered illegitimate and because it requires coordination between various employees within a firm, a high level of trust in public institutions increases the adjustment costs of size management.
Information costs are conditional primarily on the amount of information that has to be taken into account by firms to be able to consider all the relevant aspects of an enforcement regime, specifically the segmentation of taxpayers and the audit selection process. Hence, information costs are conditional on the complexity of the threshold-dependent enforcement regime. Imperfect information resulting from information costs can prevent taxpayers from optimal behavior, a phenomenon referred to as inattention in the prior literature (Bosch et al., 2019;Kleven and Waseem, 2013;Kosonen and Matikka, 2019;Søgaard, 2019). 6 For instance, according to prior research, taxpayers seem to have systematic misperceptions of their average and marginal tax rate, leading to suboptimal tax decisions. This scenario applies to individuals (Brown, 1969;Fujii and Hawley, 1988) as well as to firms (Graham et al., 2017). Furthermore, there is overwhelming evidence that taxpayers tend to subjectively overestimate low probabilities in tax settings, such as the probability of being audited (Alm, 2019).

Overview
The tax administration in Germany is decentralized to the level of the 16 states. Nonetheless, most taxes are shared between the federal government and the state governments (e.g., personal income tax (PIT), CIT and VAT). Operational tax collection and tax enforcement are conducted by local tax offices, mostly on the level of Germany's approximately 400 districts, and are under supervision by the states' ministries of finance. Comparability of tax enforcement across states is ensured by 1 3 Threshold-dependent tax enforcement and the size distribution… federal courts and by binding administrative regulations issued by the Federal Ministry of Finance. However, states may differ particularly in the resources that are available for audits.

Firm size thresholds
Germany aims to increase the resource efficiency of its tax audits by segmenting firms and by applying different levels of audit intensity to each segment. To this end, firm size is the most relevant segmentation criterion. Specifically, firms are segmented into four size classes (VS-, S-, M-and L-class) based on FSTs that refer to individual legal entities. 7 FSTs in Germany vary between industries. Specifically, the German tax administration differentiates between four main audit industry clusters (AICs): trading, manufacturing, freelancing and services. 8 Every three years, i.e., at the beginning of each segmentation cycle, firms that belong to one of these AICs are assigned to a specific size class if their size exceeds either the respective FST for profit or for revenue (or both). 9 For each segmentation, the tax administration uses information on profit from CIT returns or PIT returns and information on revenue from VAT returns to assign firms to one of the size classes. For the segmentation cycle starting in t the profit and revenue information commonly derive from tax returns for the year t − 3 or the year t − 2 . However, firms cannot know which year's tax return will be used for segmentation. Consequently, firms that intend to engage in size management need to ensure that they do not exceed the respective FST for profit and for revenue in both t − 3 and t − 2 . Furthermore, FSTs are marginally adjusted prior to each segmentation cycle. Although the adjusted FSTs of each segmentation cycle are made publicly available online on the website of the Federal Ministry of Finance and in the Federal Gazette shortly before the segmentation, firms in t − 3 and t − 2 do not know the exact FSTs that will be applied in the next segmentation cycle starting in t. Table 1  All FSTs invariably increase over time in terms of both profit and revenue. As an example, the VSS-FST for trading firms in 2004 for profit (revenue) was 30,000 (145,000) euros, the SM-FST was 47,000 (760,000) euros, and the ML-FST was 244,000 (6,250,000) euros. By 2010, the VSS-FST increased to 34,000 (160,000) euros, the SM-FST to 53,000 (840,000) euros and the ML-FST to 265,000 (6,900,000) euros.  1 3 Threshold-dependent tax enforcement and the size distribution… Table 2 reports the euro and percentage changes (in parentheses) in FSTs from 2004 to 2007 (Panel A) and from 2007 to 2010 (Panel B) for the main AICs using the information reported in Table 1.
Across all AICs, neither the percentage nor the euro adjustments of the FSTs are consistent over time. For instance, the VSS-FST for trading firms increased by 2,000 (10,000) euros, the SM-FST by 3,000 (40,000) euros and the ML-FST by 6,000 (250,000) euros for profit (revenue) from 2004 to 2007. From 2007 to 2010, the VSS-FST increased by 2,000 (5,000) euros, the SM-FST by 3,000 (40,000) euros and the ML-FST by 15,000 (400,000) euros. In relative terms, the increases range from 2.5% to 6.9%. Consequently, it is not possible for firms to exactly predict the FSTs that will be applied in the next segmentation cycle. However, firms are aware of FSTs applied for the current segmentation cycle, and FSTs have historically never decreased. 10 Consequently, firms using a conservative approach will rationally manage their size to the FSTs last made publicly available, i.e., FSTs applied for the current segmentation cycle.

Audit probability
A firm's size class strongly affects its audit probability due to the specific design of the audit target selection process, which relies on 1) risk-dependent selection, 2) random selection and 3) time-dependent selection (Harle and Olles, 2017;Wenzig, 2014). First, under risk-dependent selection, firms are selected for audit based on firm-specific risk factors identified from entries in tax returns. These risk factors include, e.g., foreign business activities, loss carry-forwards and deviations from industry averages. Second, under random selection, firms are drawn randomly and independently of firm-specific characteristics. Specifically, within each size class, a number of firms are drawn randomly to reduce the predictability of audits. Finally, and most important, under time-dependent selection, firms are selected regardless of their firm-specific characteristics but only according to binding target intervals at which firms in each size class must be audited. These target intervals differ across size classes and are three to four years for L-class firms, 8.5 to 10.5 years for M-class firms and 14.4 to 20 years for S-class firms. For VS-class firms, no target interval is set (Bavarian General Accounting Office, 2013; Kaligin, 2014). Despite a slight increase in the application of risk-dependent selection since the introduction of automated risk management systems in recent years, time-dependent selection remains the most important component of the target selection process in Germany (German Bundestag, 2021;Klein and Rüsken, 2020). Because timedependent selection depends exclusively on a firm's size class, size class is the major determinant of audit probability. Furthermore, as target intervals differ across the size classes, audit probability changes discontinuously at FSTs. Coherently, eight out of nine tax consulting professionals consider a firm's size class as the major determinant of audit probability in Germany. 11 Note that the amount by which a respective FST is exceeded is irrelevant for size class segmentation and that individual auditors have little discretion in selecting firms because audit schedules are established at the level of local tax offices according to the target selection process described above. Nevertheless, due to riskdependent selection, and the fact that firm size is presumably positively correlated with some risk factors, audit probability positively correlates with firm size within size classes. This may attenuate the discontinuities in audit probability at FSTs to a certain degree. However, because time-dependent selection represents by far the most important component of the target selection process and because target intervals vary substantially across size classes, it is very unlikely that risk-dependent selection would completely eliminate the jumps in audit probability.
Historical audit rates conditional on size class are made publicly available on an annual basis by the Federal Ministry of Finance. As audit rates remain virtually 1 3 Threshold-dependent tax enforcement and the size distribution… unchanged over time, they provide a reliable estimate of firms' audit probabilities. According to several rulings of the German Federal Finance Court, the differences in audit rates across size classes do not violate the principle of equality of the German constitution because the tax administration is allowed to segment taxpayers for an effective use of its limited resources. 12

Audit quality
Size class also affects audit quality. Specifically, administrative regulation dictates that for L-class firms, audits must be consecutive, i.e., once a firm is audited, it must cover all years that were not covered by the previous audit of that firm, whereas for M-class, S-class and VS-class firms, the audit period must not exceed three calendar years. Moreover, more-experienced and better-trained auditors are generally assigned to larger cases (Bavarian General Accounting Office, 2013). Additionally, the size and specialization of audit teams increase with the audited firm's size class. Furthermore, the Federal Central Tax Office regularly assigns additional federal auditors to audits of mainly L-class firms. Table 3 reports historical audit rates, audit periods and additional tax revenues generated from audits (consisting of additional tax claims, interest payments and penalty fees) per size class for the years 2004 (Panel A), 2007 (Panel B) and 2010 (Panel C).

Audit outcomes
The majority of German firms are assigned to the VS-class, which is expected. For instance, in 2010, 74.6% of firms were assigned to the VS-class, 13.9% to the S-class, 9.3% to the M-class and 2.2% to the L-class. Furthermore, it can be seen that audit rates change strongly at FSTs. In 2010, 1.0% of firms in the VS-class, 3.5% of firms in the S-class, 6.9% of firms in the M-class and 21.1% of firms in the L-class were audited. On average, an audit covered 2.9 calendar years in the VSclass and the S-class, 3.0 years in the M-class and 3.3 calendar years in the L-class.
Consequently, 70.8% of the additional tax revenue of 16.8 billion euros was derived from audits of L-class firms in 2010. This corresponds to 293,813 euros per audited firm. However, with 15,013 (16,878) [23,502] euros, the average additional tax revenue per audited firm was economically significant for the VS-class (S-class) [M-class] as well. 13

Benefits of size management
As discussed in Sect. 2.2.1, firms engage in size management around FSTs only if the benefits of size management, i.e., the difference in expected firm audit costs just 13 A recent survey among German firm managers shows that approximately 75% of all audits result in additional tax revenue (PricewaterhouseCoopers, 2019). 12 For instance, see German Federal Court of Finance (1988). above and just below the FST, exceed optimization costs. To provide some indication of the extent of the benefits of size management, we conduct a simple back-ofthe envelope calculation.
First, we assume that conditional firm audit costs do not change strongly at FSTs, i.e., between size classes. This is a simplification, as size class particularly affects audit quality (see Sect. 3.4). Under this assumption, the decrease in expected firm audit costs that is caused by size management results merely from the discontinuous changes in audit probability at FSTs. We use the average additional tax revenue per audited firm in 2010 from Table 3 as a proxy for conditional firm audit costs, specifically, additional tax claims, interest payments and penalty fees, for the average firm in each size class. 14 We further make the simplifying assumption that the profit of the average firm in each size class corresponds to the midpoint of that size class. The profits of the smallest and the largest firm in each size class are defined by the FSTs for trading firms for the segmentation cycle starting in 2010 from Table 1. For the VS-class, we assume that the smallest firm in that size class makes a profit of zero, and for the L-class, we assume that the largest firm makes a profit of ten million euros. 15 We divide the conditional firm audit costs for the average firm in each size class by the profit of the average firm to obtain the ratio of conditional firm audit costs to profits for the average firm in each size class. To obtain the ratio of conditional firm audit costs to profits at FSTs, we calculate the mean of the ratio of conditional firm audit costs to profits of the average firm in the size class to the left and to the right of the respective FST. To finally derive the ratio of expected firm audit costs to profits at FSTs, we multiply the conditional firm audit costs to profits at FSTs by the audit rates, i.e., a proxy for audit probabilities, in the size class to the left and to the right of the respective FST. As audits usually cover more than one calendar year, we multiply audit rates by the average audit period in each size class in 2010 from Table 3 to obtain proxies for the probability that the tax return for a single year will be audited. Correspondingly, we divide conditional firm audit costs and expected firm audit costs by the average audit period to obtain conditional and expected audit costs per year.
To account for the possibility that audit probability is correlated with firm size within size classes, we assume that audit rates for the smallest (largest) firms in every size class correspond to 90% (130%) of the average audit rate in that size class in 2010 from Table 3. 16 Figure 1 shows the ratio of conditional firm audit costs to profits (dot markers) for the average firm in the VS-class, S-class, M-class and L-class and at the VSS-FST, SM-FST and ML-FST (solid horizontal lines) under these assumptions. The shortdashed line represents a trend line of the ratio of conditional firm audit costs to profits based on a third-order polynomial. The dash-dotted line indicates the audit probability. Finally, the solid line shows the ratio of expected firm audit costs to profits at FSTs, i.e., the ratio of conditional firm audit costs to profits multiplied by the audit rate in the respective size classes.
The ratio of conditional firm audit costs to profits is 30.5% (13.4%) [4.9%] {1.7%} for the average firm in the VS-class (S-class) [M-class] {L-class}. Hence, our estimates indicate that the ratio of conditional firm audit costs to profits is decreasing with firm size. This is plausible for two reasons. First, larger firms tend to have more tax expertise and hence likely engage in more sound tax avoidance compared to smaller firms (Chen et al., 2010). As more sound tax avoidance is less likely to be objected to by the tax administration, this leads to a lower ratio of conditional firm audit costs to profits. Second, a decreasing ratio of conditional firm audit costs to profits is also consistent with the political cost hypothesis. The political cost hypothesis predicts that larger firms take less aggressive tax positions (Gupta and Newberry, 1997;Zimmerman, 1983). Less aggressive tax positions also imply a lower probability of objections by the tax administration and hence lower conditional firm audit costs relative to profits. The ratio of conditional firm audit costs to profits is 21.9% at the VSS-FST. Accordingly, the ratio of expected firm audit costs to profits is 0.8% just below the VSS-FST, where audit probability is 3.8%, and 2.0% just above the VSS-FST, where audit probability is 9.1%. Consequently, expected firm audit costs decrease by 1.2% of profits if a firm with profit just above the VSS-FST engages in size management. Analogously, the decrease in the ratio of expected firm audit costs to profits due to size management is 0.5% at the SM-FST and 1.2% at the ML-FST. Hence, the  Table 3. The solid line shows the ratio of expected firm audit costs to profits, i.e., the ratio of conditional firm audit costs to profits multiplied by the audit probability in the respective size classes. As audits usually cover more than one calendar year, we multiply audit rates by the average audit periods in each size class in 2010 from Table 3 to obtain proxies for the probability that the tax return for a single year will be audited. Correspondingly, we divide conditional firm audit costs and expected firm audit costs by the average audit period to obtain conditional and expected audit costs per year benefits of size management appear to be substantial in economic terms at all FSTs, and therefore, firms have reason to manage their size at FSTs.

Hypothesis development
Despite the substantial benefits of size management at all FSTs, it remains an empirical question whether firms in our setting engage in size management, as no data are available to provide a reliable estimate of optimization costs for German firms. However, the criteria applied for segmentation and the high complexity of the thresholddependent enforcement regime in Germany are expected to increase optimization costs as described in Sect. 2.2.3. First, firms in Germany have to take into account multiple criteria, i.e., profit and revenue, in their size management, which makes size management considerably more difficult and more time-consuming. Second, profit and revenue are more difficult to manage than profit alone, as revenue cannot be adjusted through additional expenditures at the last minute. Finally, the complexity of the enforcement regime, e.g., four different size classes, regular adjustments of FSTs and industry-specific FSTs, also make FSTs less salient for firms and increase optimization cost via the information costs channel.
Accordingly, we state H1 as follows in the alternative form: Hypothesis 1 Threshold-dependent tax enforcement is associated with size management.
As shown in Fig. 1, the benefits of size management, i.e., the decreases in expected firm audit costs, vary between FSTs. However, the variation is not substantial. Accordingly, we state H2 as follows in the alternative form: Hypothesis 2 The extent of size management, i.e., the number of firms engaged in size management relative to the total number of firms around that FST, varies between size classes.
As discussed in Sect. 2.2.2, conditional firm audit costs vary between industries. For instance, under third-party reporting, the traceability of transactions is presumably larger in industries with a major share of business customers compared to industries with a major share of individual customers. Hence, incentives to engage in size management vary between AICs. Accordingly, we state H3 as follows in the alternative form: Hypothesis 3 The extent of size management varies between AICs.

Empirical strategy
To test our hypotheses, we exploit the fact that size management creates a discontinuity around the FST in an otherwise relatively smooth firm size distribution. More specifically, size management creates a missing mass (smaller number of firms than any continuous distribution would predict) above the FST and an excess mass (larger number of firms than any continuous distribution would predict) below it. Due to variable adjustment costs, the missing mass is expected to derive from a limited area above the FST. Furthermore, also due to variable adjustment costs, the excess mass is expected to be located in a limited area below the FST. 17 To test H1 and H2, we fit a polynomial to the distribution of SIZE, which denotes profit (EBT) and revenue (REV), i.e., the two size variables on which FSTs are based in our setting. Technically, both EBT and REV are standardized by dividing all values of SIZE by the FST last made publicly available for the respective AIC, i.e., the standardized variables take a value one if a firm exactly meets the FST. 18 We adapt techniques from prior bunching literature (Chetty et al., 2011;Kleven and Waseem, 2013;Saez, 2010). 19 Specifically, we divide SIZE into equal-sized bins and fit a fifth-order polynomial using the midpoint of each bin as data points. We estimate a regression of the following form: where F j is the percentage of firms in bin j (i.e., relative to the total number of firms in all bins), x j is the SIZE midpoint of bin j and the k 's are intercept shifters, i.e., coefficients for each of the bins in the bunching interval, i.e., the area where bunching is expected. The indicator function 1(x j = k) takes the value one for each of the bins in the bunching interval with x lb and x ub being the lower and upper bounds of the bunching interval, respectively. Consistent with Bernard et al. (2018), we choose the bin width as 2% of the FST. This bin width is large enough for the distributions of SIZE to be relatively smooth (in the absence of size management) but presumably small enough for firms to manage size by the amount corresponding to the bin width at a reasonably low cost. 20 Following Almunia and Lopez-Rodriguez (2018) and Bernard et al. (2018), we focus on firms in the interval between 50 and 150% of (1) The excess mass is not expected to form a single spike at the FST, as firms are unable to manage size to exactly match the FST, e.g., due to the indivisibility of transactions (Almunia and Lopez-Rodriguez, 2018). 18 Recall that at the time firms have to engage in size management, firms do not know the exact FSTs that will be applied in the next segmentation cycle, but firms are aware of FSTs applied for the current segmentation cycle and know that FSTs have never been adjusted downward in the past. Consequently, it is reasonable to assume that using a conservative approach, firms will manage their size to the FSTs applied for the current segmentation cycle (see Sect. 2.1). However, we repeat our analyses at different placebo FSTs to ensure that our results are not driven by the selection of FSTs (Sect. 8.6). 19 We apply alternative bunching tests in Online Appendix A.4 to corroborate our results. 20 We apply two different specifications of the bin width, i.e., 0.5-1%, in Online Appendix A.3 to ensure that our results are not driven by the model specification.

3
Threshold-dependent tax enforcement and the size distribution… each FST to obtain precise estimates. We set the lower bound of the bunching interval as three bins to the left and the upper bound as three bins to the right of the FST. H1 predicts size management to occur around the FSTs. H1 is confirmed if any of the k s to the left ( 0.95 , 0.97 , 0.99 ) are positive and significant, indicating an excess mass below the respective FST. Furthermore, the coefficients are expected to decrease in absolute values with increasing distance to the FST due to increasing optimization costs. 21 H2 predicts that the extent of size management varies between size classes. H2 is confirmed if the k s to the left differ significantly across individual FSTs.
H3 predicts that the extent of size management varies between AICs. To test H3, we estimate Equation 1 separately for each of the four main AICs for both EBT and REV. H3 is confirmed if the k s to the left differ significantly across individual AICs. To control for differences between industries within AICs, we also estimate Equation 1 separately for every individual industry as defined by the 2-digit NACE code. 22 Again, H3 is confirmed if the k s to the left differ significantly across individual industries.

Sample selection
We obtain administrative microlevel tax return data for 2010 on the entire population of German firms from the Research Data Center (RDC) of the Federal Statistical Office and the Statistical Offices of the Federal States. 23 All data are taken from the firms' submitted tax returns, i.e., the data are prior to changes induced by audits. Specifically, we obtain data on the CIT of corporations, PIT of partners in partnerships and local business tax (LBT) of corporations, partnerships and sole proprietors. 24 We also obtain data on both annual VAT returns and VAT prefiling returns (prefilings usually occur monthly or quarterly). Table 4 shows the sample selection process.
The data originally include 2,756,463 firms with information on both REV and EBT. 25 We first exclude 36,571 (1.33%) firms that belong to a fiscal unity group for either CIT, LBT or VAT, as the FSTs refer to individual legal entities, while the 21 Negative and significant k s to the right ( 1.01 , 1.03 , 1.05 )indicating a missing mass are not required to confirm H1 as the missing mass might be dispersed across a larger area. 22 We show that for most industries the sample size is large enough to keep the probability of making a type II error below 1% in Online Appendix A.2. 23 We repeat our analyses with data for 2004 and 2007 to ensure that our results are not only prevalent in 2010 (Sect. 8.2). 24 Partnerships and sole proprietors in certain industries, such as legal consulting and agricultural or forestry firms do not pay local business tax and are thus not included in the data. 25 The raw data also include 5,183,225 firms with a missing entry for REV and/or EBT in 2010. These firms are excluded altogether. available data contain information only on profit and revenue aggregated at the fiscal unity level.
The data in principle contain information about the exact AICs to which a firm is allocated by the tax administration (i.e., trading, manufacturing, freelancing or services). However, for some of the firms, this information is missing. If this is the case, we use 5-digit NACE codes and information on legal form and LBT liability to allocate firms to the correct AICs. We ultimately exclude 6,934 firms (0.25%) that cannot be allocated to a unique AIC with the available information and 32,403 (1.18%) firms that do not belong to one of the four main AICs. Finally, we exclude all industries as defined by the 2-digit NACE code with fewer than 50 observations in the interval between 50 and 150% of each FST for EBT and REV so that, on average, we have at least one observation for each of the 50 bins used in the regression for any industry-specific analysis. This process excludes 678 (0.02%) firms. Our final sample contains 2,679,877 (97.22%) firms. 26 If available, we use the reported profit of either the CIT return or the PIT return as our EBT variable, which is also the variable definition used by the tax administration. If neither of these variables is available, we use the profit reported on the LBT return, which is closely associated with the profit of the CIT or the PIT returns. As our REV variable, we use revenues reported on annual VAT returns, which is again the variable definition used by the tax administration. If this variable is not available, we use firm-level cumulated revenues as reported on all 2010 prefiling VAT returns.

Descriptive statistics
Descriptive statistics of raw, i.e., nonstandardized, EBT (rawEBT) and raw REV (rawREV) are reported in Table 5. We report nonstandardized values of SIZE here to allow a better understanding of the data. We also report in Table 5 the exact tax returns that are used to collect rawEBT and rawREV.
The average firm reports a rawEBT of 60,240 euros (median: 16,224 euros). raw-EBT is based on CIT data in 23.26% of cases, PIT data in 18.56% of cases and LBT data in 58.19% of cases. The average rawREV is 941,742 euros (median: 101,390 euros). rawREV is based on VAT returns in 99.73% of cases and VAT prefiling returns in 0.27% of cases.
We further provide a naive graphical assessment of the distributions of EBT and REV. Figures 2 and 3 show the firm size distribution of EBT and REV, respectively, around the VSS-FSTs, SM-FSTs and ML-FSTs for the segmentation cycle starting in 2010 (solid vertical line) for the overall population of firms and separately for each AIC. The bin width is set to 2% of the FSTs. The bunching interval is set to three bins to the right and three bins to the left (dashed vertical lines).

3
Threshold-dependent tax enforcement and the size distribution… The distributions of both EBT and REV are relatively smooth and decrease in firm size around all FSTs. The distributions also become more convex for smaller FSTs. There are no notable discontinuities at any of the FSTs, neither for the full sample of firms nor when considering the four AICs separately. In Fig. 2, we note that EBT has some visible spikes in the distributions (while REV does not). However, these spikes in EBT do not appear to be associated with size management, as they seem to be distributed at random.

Results
Tables 6 and 7 report the regression results from estimating Equation 1 for EBT and REV, respectively. Panel A presents findings for H1 and H2, i.e., the results for the full sample of firms at the VSS-FSTs, SM-FSTs and the ML-FSTs for the segmentation cycle starting in 2010. Panel B presents AIC-specific findings for H3, i.e., subsample results per AIC at the VSS-FSTs, SM-FSTs and ML-FSTs. We report the coefficients for three bins to the left of the FSTs ( 0.95 , 0.97 , 0.99 ) and three bins to the right ( 1.01 , 1.03 , 1.05 ) in Panel A and the coefficient of the first bin to the left ( 0.99 ) and the first bin to the right of the FSTs ( 1.01 ) in Panel B.
All coefficients but one are economically small and statistically nonsignificant for the full sample of firms reported in Panel A of Tables 6 and 7. Hence, our data do not support H1, i.e., we do not find evidence of size management around FSTs. Consequently, the first implication of our results is that for German firms, optimization costs exceed the benefits of size management. Furthermore, as the coefficients are nonsignificant across all FSTs, the data also do not support H2, i.e., our results imply that optimization costs exceed the benefits in all size classes despite heterogeneity in benefits between those size classes. Along the same lines, three out of 48 coefficients per AIC reported in Panel B of the tables are economically small and statistically nonsignificant at the 10% level, which implies that optimization costs exceed benefits in all AICs. Hence, optimization costs appear to be considerably large.
Figures 4 and 5 present the industry-specific findings for H3, i.e., subsample results per industry at the VSS-FSTs, SM-FSTs and ML-FSTs for the segmentation cycle starting in 2010. Note that under the null of an absence of size management, coefficients are asymptotically normally distributed around zero. Consequently, t 1 3 values are asymptotically standard normally distributed, and p values are asymptotically uniformly distributed between zero and one.
In Panel A of the tables, we plot histograms and kernel estimates of density (solid line) for the regression coefficients to the left ( 0.99 ) for each of more than 70 industries in our sample. In Panel B of the tables, we plot kernel density estimates (solid line) for the respective t values and compare them to a standard normal density distribution (dashed line) to determine how the empirical distributions of t values fit the theoretical distribution of t values under the null. Finally, in Panel C of the tables, we plot the empirical cumulative distribution functions (ECDFs) for the respective p values. If the p values are distributed uniformly, the ECDF (short-dashed line) follows the line of equality (solid line diagonal).
In Panel A and Panel B of the tables, the vertical axis presents the (empirical) density. In Panel C of the tables, the vertical axis presents the (empirical) cumulative probability. The horizontal axis shows the coefficients, t values and p values.
Panel A of Figs. 4 and 5 shows a symmetric density distribution for the regression coefficients centered around zero for all FSTs and for both EBT and REV. Furthermore, the empirical density distributions for t values in Panel B fit well with the theoretical density distribution under the null. Additionally, the ECDFs for p values in Panel C follow the line of equality for all FSTs and for both EBT and REV. Hence, consistent with the AIC-specific findings for H3, the results imply that optimization costs exceed the benefits of size management even when controlling for industryspecific heterogeneity in conditional firm audit costs. Overall, our data do not support H3. Furthermore, as the density distributions of the coefficients are centered around zero, we find an indication that our results are not caused by particularly large standard errors but that coefficients are, in fact, very close to zero. We find virtually similar results for the first coefficients to the right of the FSTs ( 1.01 ) (not graphed).
Considered jointly, our data do not support a rejection of the null of an absence of size management at FSTs. This is true for both EBT and REV. Our results further suggest that optimization costs exceed the benefits of size management even when controlling for size class-specific and industry-specific heterogeneity in conditional firm audit costs. Our results correspond to the results found by Tennant and Tracey (2019) for firms in Jamaica and are in contrast to the results found by Almunia and Lopez-Rodriguez (2018) for Spanish firms. Accordingly, we argue that a pattern seems to be emerging from this relatively new field of research on how the specific design of threshold-dependent policies, i.e., the criteria applied for segmentation and the complexity of the threshold-dependent enforcement regime, can inhibit size management. Specifically, we note that Germany and Spain are relatively similar in important drivers of optimization costs because they Threshold-dependent tax enforcement and the size distribution… are similarly developed countries (as measured by GDP per capita) 27 located in Western Europe, do not differ substantially in terms of the level of trust in public institutions (as measured by the corruption perception index) 28 and have similar tax rates in terms of CIT, PIT and VAT rates. 29 However, the specific design of the threshold-dependent policies differs strongly between the two countries. Specifically, whereas the German regime relies on multiple criteria for segmentation, the Spanish regime is based on a for EBT. Panel A presents findings for H1 and H2, i.e., results for the full sample of firms at the VSS-FST, SM-FST and the ML-FST for the segmentation cycle starting in 2010. Panel B presents the findings for H3, i.e., subsample results at the VSS-FST, SM-FST and ML-FST per AIC. We report the coefficients for three bins to the left of the FST ( 0.95 , 0.97 , 0.99 ) and all three bins to the right ( 1.01 , 1.03 , 1.05 ) in Panel A but only the coefficient of the first bin to the left ( 0.99 ) and the first bin to the right ( 1.01 ) in Panel B. T values in parentheses. ***, ** and * denote significance at the 1, 5 and 10% levels, respectively (two-tailed) single criterion. Furthermore, the German regime is generally more complex because it relies on four different size classes, regular adjustments of FSTs and industry-specific FSTs. By contrast, the Spanish regime only differentiates between two size classes, FSTs are fixed in nominal terms, and FSTs do not differ across industries. Hence, we argue that the specific design of threshold-dependent policies can inhibit size management by increasing firms' optimization costs. However, ultimately, we do not have a clear enough setting to provide direct evidence that the different outcomes for Spain and Germany are driven by the specific design of the threshold-dependent enforcement regime.

3
Threshold-dependent tax enforcement and the size distribution… 8 Robustness tests

Adjustment costs vs. information costs
In our setting, it is not possible to empirically disentangle the effects of the two components of optimization costs, i.e., adjustment costs and information costs. However, an absence of size management would be unlikely if adjustment costs were the only friction at work (Bosch et al., 2019;Søgaard, 2019). In particular, due to variable adjustment costs, it appears unlikely that adjustment costs exceed the decrease in expected firm audit costs for firms in close proximity to the FST. Therefore, we argue that information costs play an important role in our setting. To provide some evidence for this argument, we consider an additional setting in which bunching has been identified by prior studies. Specifically, we analyze the distribution of the financial accounting after-tax profits (as reported in CIT returns) around zero, as there is ample empirical evidence (Bollen and Pool, 2009;Burgstahler and Dichev, 1997;Lahr, 2014) that firms attempt to avoid reporting losses for various reasons. The histogram in Fig. 6 shows the distribution of firms' ratios of after-tax profits to REV around zero (solid vertical line) in a range between −0.2 and 0.2%. The bin width is set to 0.01%. The bunching interval is set to three bins to the right and three bins to the left (dashed vertical lines).
There is a discernible discontinuity in the distribution of firms at zero in the otherwise smooth (uniform) distribution, i.e., there is bunching above zero. 30 Furthermore, we estimate Equation 1 at zero for firms with after-tax profitability between −0.2 and 0.2% and the bin width set to 0.01%. All three regression coefficients to the right are significantly positive (not tabulated), which implies that there is an excess mass between zero and 0.03%. The first coefficient to the right is significantly larger than the second coefficient and the third coefficient, which implies that firms prefer to manage their size by the smallest amount necessary to exceed the implicit threshold, suggesting variable adjustment costs. The coefficients to the left are negative but nonsignificant, suggesting that size-managing firms originate from a large area below the threshold, i.e., the missing mass is rather dispersed. Considered jointly, the results provide some indication that firms in our data practice size management and hence that adjustment costs are unlikely the only friction at work. In addition, the results provide some evidence for the sensitivity of our test to detect bunching.

Time effects
Specific time effects might have prevented size management in 2010. One reason for such an effect could be, among others, the financial crisis around that time. To ensure that our results are not only prevalent in 2010, we repeat our baseline analyses from Tables 6 and 7 as well as Panel C of Figs. 4 and 5 using data for 2004 and 2007 (not tabulated or graphed). 31 We again do not find any evidence of size management around FSTs.

Firms not exceeding the respective other firm size threshold
Due to variable adjustment costs firms exceeding the FST for revenue by far and thus facing adjustment costs that exceed the benefits of size management have no incentive to manage size at the respective FST for profit and vice versa. To reduce noise in our analyses that might stem from keeping such firms in the sample, we repeat the baseline analyses from Tables 6 and 7 as well as Panel C of Figs. 4 and 5 while restricting our sample to firms that do not exceed the respective FSTs for revenue (profit) when examining the FSTs for profit (revenue) (not tabulated or graphed). However, the results remain virtually unchanged.

Loss firms
Chen and Lai (2012), Edwards et al. (2016) and Law and Mills (2015) show that due to a higher cost of external financing financially constrained firms engage in more aggressive tax avoidance than unconstrained firms to increase internally generated funds. Correspondingly, loss firms might have larger incentives to engage in size management at FSTs for revenue. Hence, we repeat our baseline analyses from Table 7 while restricting our sample to firms with negative EBT (not tabulated or graphed). However, we again do not find any evidence of size management around FSTs.

Geographic heterogeneity
Audit intensity may vary between German states due to different resources being available for audits (see Sect. 3.1). Hence, it is conceivable that size management occurs only in states that allocate substantial resources to audits and that the respective effects in our full sample analysis are covered by the noise of states without effects. We therefore repeat the analyses from Panel B of Tables 6 and 7 per state instead of per AIC (not tabulated). However, we do not find any evidence of size management around FSTs, suggesting an absence of size management for all 16 states.
Along the same lines, as audits are conducted by local tax offices, audit intensity can also be conditional on the specific tax office responsible for an audit. Each tax office is usually responsible for one of the 400 German districts. Hence, it is feasible that size management is heterogeneous across individual districts. Therefore, we 1 3 Threshold-dependent tax enforcement and the size distribution… replicate the baseline analyses from Panel C of Figs. 4 and 5 (not graphed) per district instead of per industry. We again do not find any evidence of size management around FSTs.

Relevant firm size thresholds
Due to marginal adjustments of FSTs before each segmentation cycle, firms do not know the exact FSTs that will be applied in the next segmentation cycle when they have to engage in size management (see Sect. 3.2). However, firms are aware of FSTs applied for the current segmentation cycle when they have to engage in size management, and FSTs have historically never decreased. Consequently, we assume in our baseline analyses that firms using a conservative approach manage their size to the FSTs applied for the current segmentation cycle. However, some firms could also be less risk averse and attempt to predict the FSTs that will be applied in the next segmentation cycle, and hence, these firms would bunch in an area above the FSTs applied in the last segmentation cycle. If this is the case, the baseline analyses would not be well suited to detect size management. Therefore, we repeat the baseline analyses from Panel A of Tables 6 and 7 at different placebo FSTs (not tabulated). To obtain the placebo FSTs, we start with the FSTs applied for the segmentation cycle starting in 2010 and gradually increase FSTs in steps of 100 euros until the placebo FSTs correspond to the FSTs applied for the segmentation cycle starting in 2013. However, we still do not find any evidence of size management around those placebo FSTs. 32

Conclusion
This paper contributes to the recent literature on the effects of threshold-dependent tax enforcement. We analyze the response of German firms to discontinuities in audit intensity at publicly known FSTs. Given that tax audits usually result in substantial tax claims, interest payments and penalty fees and can cause substantial compliance costs, it would be expected that size management occurs around the FSTs. Using a large administrative dataset of tax returns, we test this prediction and exploit discontinuities in the firm size distribution that would be expected from size management. Building on established tests for bunching in the context of notches (Chetty et al., 2011;Kleven and Waseem, 2013;Saez, 2010), our empirical results indicate that there is no tax-induced size management in the overall population of German firms. The results hold when excessive testing in a large variety of subsamples is conducted, when alternative bunching tests are applied and when different alternative periods of analysis and alternative FSTs are used.
We posit that the absence of size management results from optimization costs in the form of adjustment costs and information costs. Against the background of prior research, we argue that a pattern seems to be emerging that the specific design of threshold-dependent policies can inhibit size management. Specifically, we argue that using multiple criteria for segmentation, multiple size classes, regular adjustments of FSTs after firm decisions are made and industry-specific FSTs increase optimization costs and, hence, can inhibit size management. Therefore, our findings provide relevant implications for policy makers, as they suggest that the specific design of threshold-dependent policies might allow governments to increase the efficiency of tax audits without distorting the firm size distribution and, hence, avoid the negative effects of size management on welfare. However, more research is needed to granularly disentangle the effects that individual characteristics of threshold-dependent enforcement regimes have on optimization costs.