1 Introduction

Environmental sustainability (E-S) is an increasingly desirable corporate attribute, where firms are expected to deliver their goods and services in a manner that is both environmentally sustainable and profitable.Footnote 1 A common theme of E-S is that it promotes a long-term view of shareholder value and thus curbs managerial short-termism (Eccles et al. 2014; Fink 2019; Starks et al. 2020). In this paper, we investigate two related questions about the trade-off between E-S and managerial short-termism, where short-termism is typically reflected as efforts to meet earnings benchmarks (e.g., Dechow and Sloan 1991). Do firms release more toxins by cutting back on pollution abatement costs to boost earnings in years they meet earnings benchmarks? If so, is that relation weaker for firms with higher environmental ratings? We define meeting earnings benchmarks as meeting or just beating (by two cents or less) analysts’ consensus earnings per share (EPS) forecasts (e.g., Caskey and Ozel 2017). We use releases of about 400 toxins to capture pollution, and measure environmental ratings as the environmental component (E ratings) of firm-level environmental, social, and governance (ESG) ratings.

One motivation to investigate these two questions is the adverse effects of toxins on public health, an important concern that has received much attention in the U.S. (e.g., Currie et al. 2014). Understanding why firms pollute is thus of interest to the general public, and of particular interest to regulators tasked to preserve clean air, water, and land. Second, while prior research documents the negative externalities of real earnings management (REM) on firm stakeholders such as investors and employees (Caskey and Ozel 2017; Raghunandan 2021), REM’s impact on society remains relatively unexplored.Footnote 2 Finally, whether the focus on long-term value promoted by E-S moderates the managerial short-termism underlying REM is also unexplored.

Our data on toxic releases are taken from the EPA’s Toxic Release Inventory (TRI) database, which covers chemical releases that may have long-term health and environmental effects.Footnote 3 While TRI chemicals are toxic, they are not as hazardous as the six criteria air pollutants (including SO2, lead, and ozone), which are monitored by the EPA in real-time to allow immediate responses if limits are breached.Footnote 4 In contrast, TRI disclosures are self-reported and filed with a delay (by June 30 of the next year).

To illustrate how firms might cut pollution abatement costs to meet earnings benchmarks, consider the seven years around 2009 for Texas Instruments (TI), a semiconductor manufacturer in our sample. TI just met the consensus analyst forecast in 2009, and in the other six years it either missed or comfortably beat the consensus (by more than 2 cents). In 2009, TI released an incremental 6.4 tons of hydrogen fluoride (HF), a toxin generated when fluorine etches silicon used to make semiconductor wafers.Footnote 5 Packed bed wet scrubbers, the most common abatement technology for HF, can be switched off to reduce abatement levels and later switched on. Using an estimate of $20.6 million for the abatement cost per ton of HF, pre-tax earnings per share for TI increased by about 10 cents.Footnote 6 We note that TI has a high E rating, which runs counter to the view underlying our second research question: managers of environmentally friendly firms with a long-term focus should be less likely to pollute to meet short-term earnings benchmarks.

Consistent with the TI illustration, we find that toxin releases for “suspect” firm-years in our sample—firm-years that meet or beat the consensus by 2 cents or less—are higher by about 15%, an economically and statistically significant increase, relative to firm-years that miss or comfortably beat forecasts. We control for a variety of firm-level proxies for normal levels of production, various fixed effects that account for unobservable permanent firm/plant characteristics, and time-varying industry-wide shocks. These results suggest that pollution rises in years when managers who are under pressure to meet earnings benchmarks cut pollution abatement costs to boost earnings.

We confirm that this first finding is robust. We observe similar results when we scale toxic releases by revenues or weight released toxins by their relative toxicity (assigned by the EPA’s Risk-Screening Environmental Indicator (RSEI) program). Our results are not due to pollution increases caused by firms overproducing to boost earnings, as production does not increase abnormally in years when firms meet earnings benchmarks. And the observed rise in pollution levels is not a byproduct of (unexpected) performance improvements, because pollution levels do not increase for firms that comfortably beat forecasts. Finally, as a falsification test, we show that the positive relation between pollution and meeting earnings benchmarks declines when we use noisier proxies (from earlier in the fiscal year) for those benchmarks.

Turning to our second research question, we find that the additional toxic releases observed in suspect firm-years increase with a firm’s E rating from MSCI ESG KLD STATS (shortened to MSCI). As mentioned in the TI illustration above, this result is surprising because managers of firms with higher E ratings—which we find reliably reflect environmental sustainability track records—should focus on the long term and avoid pollution increases. Although these firms pollute less on average, which is consistent with a long-term focus, they increase pollution levels when needed to meet earnings benchmarks. Managers of firms with high E ratings appear to be trading off multiple conflicting goals (Levinthal and Rerup 2021): they maintain lower pollution levels over the long term, yet increase pollution to meet market expectations for short-term earnings.

To confirm the robustness of this second result, we conduct additional tests leveraging other information provided by MSCI. First, in addition to E ratings, MSCI provides S&G ratings, which serve as placebos because they should have weaker links with toxic releases. We find that the S&G ratings do not explain variation in the positive relation between pollution and meeting benchmarks. Second, these E ratings cover a wide range of environmental performance indicators, some that are directly related to toxic releases generated by production processes (e.g., pollution prevention and past toxic emissions) and others that are not (e.g., producing environmentally beneficial products). We find that the moderating effect of the overall E rating is driven by the subset of environmental indicators that are directly related to toxic releases from production processes. Finally, MSCI separates environmental indicators into environmental strengths (e.g., pollution prevention programs) and concerns (e.g., quantities of past toxic emissions), representing mainly inputs to and outcomes from environmental programs, respectively. We find that the relation between pollution and meeting earnings benchmarks only increases (decreases) with strengths (concerns) that are directly related to toxic releases.

One plausible explanation for our surprising second result is that firms with better environmental track records (reflected in higher E ratings) generate “pollution slack” that allows managers to increase pollution, when needed, at a lower cost than for firms with poor E ratings. Slack could arise from higher accumulated emission allowances that offer firms the flexibility to temporarily increase pollution without exceeding regulatory emission limits (regulatory slack). Slack could also arise if a good reputation that firms have built over the long term with stakeholders such as consumers, communities, and investors softens the negative impact of occasional pollution increases (reputation slack).

Consistent with this explanation, we find that higher E ratings are associated with less toxic releases, fewer EPA enforcement cases, and smaller EPA enforcement penalties in the past. We also find that the moderating effect of environmental track records on the positive relation between pollution and meeting earnings benchmarks observed across firms also holds within firms at the plant level. Firms that are polluting more to meet earnings benchmarks increase pollution at plants that have better track records.

We consider the impact of our findings on sustainable investing, which refers to the practice of incorporating ESG factors in portfolio decisions (Amel-Zadeh and Serafeim 2018). Prior studies (Starks et al. 2020; Cao et al. 2021) show that sustainable funds are more patient—i.e., they take a longer-term perspective—and respond less to short-term earnings information.Footnote 7 However, if sustainable funds invest in firms with higher ESG ratings (Curtis et al. 2021; Heath et al. 2021), our findings above suggest that these investees are likely to pollute more to achieve short-term earnings benchmarks. We find that this is indeed the case: firms with higher levels of sustainable institutional ownership tend to pollute more when meeting earnings benchmarks.

It is useful to contrast our findings with those in Liu et al. (2021), who document increased pollution by Chinese firms meeting earnings benchmarks. On the surface, the analyses of our first research question may seem similar to Liu et al. (2021), but there are important differences between our inferences and theirs. Their result only holds for state-owned (SOE) firms controlled by the central government. Chinese non-SOE firms, which are the relevant peer group for our U.S. sample, do not pollute more to meet earnings benchmarks. To better understand the different inferences from the two studies, we focus on the respective pollutants investigated: we study a broad group of TRI chemicals, whereas they study SO2, a criteria chemical that the EPA monitors in real time because it is so hazardous.Footnote 8 We too find that U.S. firms do not emit more SO2 when meeting earnings benchmarks.Footnote 9 By extending our study to incorporate the second research question and ESG rating data, we are able to provide a fuller picture of cross-sectional variation in pollution levels around suspect firm-years.

We contribute to the real earnings management (REM) literature by showing that REM is associated with increased pollution by U.S. firms. Early REM research, which considers decisions such as overproduction and R&D cuts, suggests that the costs of managerial short-termism are borne by shareholders. Subsequent research finds that the costs of REM are also borne by employees, due to cuts in spending on employee safety and compensation (Caskey and Ozel 2017; Raghunandan 2021). Our results suggest that the negative effects of REM extend beyond firm boundaries to the environment and society at large.Footnote 10

Second, we add to the corporate sustainability/ESG literature by showing that even though firms that are rated as being more environmentally friendly pollute less in general, they increase pollution when meeting earnings benchmarks. This runs counter to the view that corporate sustainability curbs managerial short-termism. Finally, our study contributes to the environmental economics literature by adding financial reporting incentives to the previously documented determinants of toxic releases, which include financial constraints (Cohn and Deryugina 2018; Goetz 2018; Kim and Xu 2022), legal liability (Akey and Appel 2021), and third-party auditors (Duflo et al. 2013).

The remainder of our paper is laid out as follows. Section 2 develops hypotheses after reviewing institutional details and prior research. Section 3 provides details of our data and samples. Section 4 examines the effect of meeting benchmarks on toxic releases. Section 5 investigates the role of E-S. Section 6 conducts additional analyses, and Section 7 concludes.

2 Institutional details, related literature, and hypothesis development

2.1 Environmental regulation, pollution abatement, and accounting for pollution abatement

2.1.1 Environmental regulation

Amid elevated concerns about environmental pollution in the 1970s, Congress passed several laws (e.g., Clean Air Act, Clean Water Act, Safe Drinking Water Act, and Resource Conservation and Recovery Act) and consolidated many environmental responsibilities under one federal agency, the U.S. Environmental Protection Agency (EPA). The EPA has since served as the central authority for federal research, standard-setting, monitoring, and enforcement of environmental protection.Footnote 11 Compliance with environmental rules and standards causes firms to partially internalize environmental costs created by their operations.

Disclosure serves as an important tool within the environmental regulatory system. In 1986, Congress passed the Emergency Planning and Community Right-to-Know Act (EPCRA) to support and promote emergency planning and to provide the public with information on the release of toxic chemicals in their communities.Footnote 12 In particular, Section 313 of the EPCRA established the Toxic Release Inventory (TRI) program, which requires plants to disclose annually, to the EPA, the release of toxic chemicals that may cause negative chronic (long-term) health and environmental effects.Footnote 13 Under Section 313, reporting the amount of each chemical released to the environment is required for U.S. facilities that have at least ten full-time employees; operate in one of roughly 400 industries defined at the six-digit NAICS level; and manufacture, process, or otherwise use listed chemicals in amounts above established levels.Footnote 14 By making information about toxic chemicals publicly available, TRI creates a strong incentive for firms to improve their waste management and environmental performance. Although TRI data are self-reported, ample evidence suggests that they are generally reliable.Footnote 15 As such, TRI data are widely used in the literature to measure pollution and environmental performance (e.g., Chatterji et al. 2009; Akey and Appel 2021; Kim and Xu 2022).

Some toxins are more hazardous and have acute (immediate) effects, and consequently are subject to real time monitoring. The six pollutants that fall in this category, labeled criteria air pollutants, are sulfur dioxide (SO2), carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), lead (Pb), and particulate matter (PM10 and PM2.5). Ambient concentrations of these pollutants are measured and monitored at more than 4,000 monitoring stations owned and operated by state environmental agencies.Footnote 16 The agencies send hourly or daily measurements of pollutant concentrations to the AQS (Air Quality System), an EPA database. Firms that are noncompliant with national ambient air quality standards can face sanctions under the Clean Air Act, which may result in severe economic consequences.

2.1.2 Pollution abatement

Environmental standards typically require plants to integrate pollution abatement technologies and measures into their production processes. As a result, a plant’s pollution level is jointly determined by its production processes and pollution abatement activities. Pollution abatement activities fall into two broad categories: pollution prevention and postproduction processing (see Appendix A).

Pollution prevention generally refers to practices that reduce the amount of any hazardous substance, pollutant, or contaminant entering any waste stream or otherwise released into the environment prior to recycling, energy recovery, or treatment. Examples of pollution prevention include process or procedure modifications, equipment or technology modifications, reformulation or redesign of products, substitution of raw materials, and improvements in housekeeping, maintenance, training, and inventory control.

Postproduction processing generally refers to recycling, energy recovery, and treatment of any hazardous substance, pollutant, or contaminant chemical. Recycling is the recovery for reuse of a toxic chemical. Energy recovery is the combustion of a toxic chemical in a combustion unit that is integrated into an energy recovery system (i.e., industrial furnaces, industrial kilns, and boilers). Treatment includes a variety of methods (biological treatment, chemical oxidation, and incineration) to reduce the toxicity of chemicals in waste streams.

2.1.3 Accounting for pollution abatement costs

As pollution abatement is integrated into production processes, its costs are allocated to product costs.Footnote 17 The abatement costs generally fall into two categories: depreciation of capital expenditures and operating costs. Capital expenditures include the purchase, installation, and startup costs of abatement technologies and equipment to prevent, recycle, energy recover, treat, and dispose of chemical waste. Costs of production redesign, process modification, and material substitution are also capitalized, as they typically yield reduced emission costs over the long term.

Operating costs of pollution abatement occur during both pollution prevention and postproduction processing. Examples of pollution prevention costs include the cost of running and maintaining pollution prevention equipment (e.g., leak detection equipment) and the cost of modifying production processes. Examples of postproduction processing costs include employee salaries/wages, costs of materials/supplies, utility/energy costs, and costs of purchased services (e.g., contract work and lease rentals) used to test/monitor, recycle, treat, and dispose of pollutants.

The annual cost of pollution abatement (i.e., operating cost and depreciation expense) can be a nontrivial portion of total product cost, especially in certain industries. Joshi et al. (2001) offer a reliable estimate, likely at the high end of the range. Using confidential plant-level data for 55 steel mills from 1979 to 1988 drawn from the Pollution Abatement Costs and Expenditures (PACE) surveys, they estimate that periodic abatement costs account for 46%–53% of total product cost. Furthermore, according to the 2005 PACE Survey conducted by the U.S. Census Bureau and EPA, operating costs of pollution abatement are more than six times the depreciation expense of abatement investment. Given that most periodic pollution abatement costs are variable, we believe that cutting pollution abatement costs is a viable and effective way to boost short-term earnings.

2.2 Managerial short-termism, real earnings management, and externalities

The real earnings management (REM) literature provides compelling evidence, both survey-based and empirical, of managers manipulating real activities to meet earnings benchmarks even if these actions sacrifice long-term shareholder value.Footnote 18 Graham et al. (2005) report that about 80% of surveyed U.S. executives admit to sacrificing long-term value when engaged in REM. Consistent with this survey, Bhojraj et al. (2009) and Cohen and Zarowin (2010) empirically show that REM negatively impacts long-run performance.

In addition to negative effects on long-term shareholder value, recent studies show that REM can also result in negative externalities on firm employees. Specifically, efforts to meet or beat analyst forecasts lead to more workplace injuries (Caskey and Ozel 2017) and lower wages (Raghunandan 2021). We extend this emerging literature on externalities due to REM by examining the effects of meeting earnings benchmarks on the environment and society.

Using data on Chinese listed companies, Liu et al. (2021) show that Chinese state-owned enterprises (SOEs) controlled by the central government emit more SO2 when meeting or just beating analyst forecasts, but locally controlled SOEs and non-SOEs do not. Liu et al. (2021) posit that central SOEs are protected by the central government from the environmental monitoring and enforcement carried out by local environmental protection bureaus (Eaton and Kostka 2017). More relevant to our study is their finding that non-SOE firms, which represent the subset of Chinese firms that are most comparable to our U.S. sample, do not increase SO2 emissions when they meet earnings benchmarks. Compared with the TRI toxins that we focus on, SO2 is far more hazardous and monitored in real-time for U.S. firms.

2.3 Corporate sustainability and ESG

As regulation may be insufficient to reduce negative externalities—especially those driven by managerial short-termism—corporate sustainability has recently been suggested as a market-based solution (Bénabou and Tirole 2010). Grewal and Serafeim (2020) define corporate sustainability as an intentional strategy to create long-term value through improved measurable social and environmental impact.

According to this view, which is also referred to as “doing well by doing good,” corporate sustainability is a long-term strategy that leads to long-run shareholder value maximization, even though it may come at a short-run cost (e.g., Bénabou and Tirole 2010; Eccles et al. 2014; Lins et al. 2017; Starks et al. 2020).Footnote 19 Consistent with this view, Friede et al. (2015) document that 90% of academic studies find a nonnegative relationship between ESG and financial performance. In particular, Dowell et al. (2000) and Konar and Cohen (2001) find a positive relation between the environmental component of ESG and firm value. As described in Section 2.4.2 below, this view is not universally held; there is disagreement about the benefits of corporate sustainability.Footnote 20

We extend the growing sustainability/ESG literature by examining whether corporate environmental sustainability (E-S) attenuates the potential negative effect of managerial short-termism on pollution. Finding evidence consistent with this proposition reinforces the view that commitment to and investment in E-S curbs managerial short-termism.

2.4 Research hypotheses

2.4.1 Meeting benchmarks and toxic releases

The above review of institutions and literature suggests that managers trade-off the short-term benefits of meeting earnings benchmarks with the potential long-term benefits of reduced pollution and improved E-S.Footnote 21 Cutting pollution abatement activities such as recycling, energy recovery, and treating pollutants reduces the costs of goods sold and increases earnings, and thereby increases the chance of meeting earnings benchmarks. Not meeting those benchmarks could have an immediate, large negative impact on share price (Skinner and Sloan 2002). While polluting to beat an earnings benchmark provides a short-term benefit for managers, the increased pollution imposes a long-term cost on the environment and society and can negatively affect the firm’s E ratings and reputation, possibly even leading to environmental enforcement and litigation.Footnote 22

As managers tend to have shorter horizons than regulators and important stakeholders (Bénabou and Tirole 2010), we expect them to overemphasize the benefits of meeting short-term earnings benchmarks. That is, we expect higher toxic releases when managers cut pollution abatement costs to boost earnings in an effort to meet those benchmarks. This expectation implies the following testable prediction:

  • Hypothesis 1: For firms that release TRI toxins, meeting earnings benchmarks is positively associated with toxic releases.

Several factors may prevent us from observing a significant positive relation between meeting benchmarks and toxic releases. The long-term reputational or regulatory costs of increased pollution may be large enough to overwhelm the short-term benefits of earnings management. Consider, for example, the evidence in Liu et al. (2021) for non-SOE Chinese firms: the cost of SO2 emissions appears to be prohibitive. Even in cases where the costs of increased pollution are lower, they might still exceed the costs of alternative ways to meet earnings benchmarks, such as guiding down analysts’ earnings forecasts or engaging in other forms of earnings management, both real and accruals-based (Dechow et al. 2010).

2.4.2 The moderating effect of environmental sustainability

Observing a positive relation between pollution and meeting earnings benchmarks, as predicted by H1, raises a follow-on question: Is that relation moderated by a firm’s commitment to E-S? The long-term shareholder view of E-S (reviewed in Section 2.3) predicts a weaker link between meeting benchmarks and toxic releases for firms that are more environmentally sustainable. This is because firms with high E ratings commit to a long-term strategy to reduce their environmental impact. They are thus less likely to increase pollution to meet short-term earnings benchmarks, which leads to the following hypothesis:

  • Hypothesis 2: If there is a positive association between meeting earnings benchmarks and toxic release, that association is weaker among firms with higher environmental ratings.

This prediction too is not without tension. Specifically, some practitioners and academics suggest that firms engage in “greenwashing” and exaggerate their environmental sustainability efforts (Bénabou and Tirole 2010). If so, E ratings are unreliable, as they primarily reflect voluntary disclosures of environmental policies (Lopez-de-Silanes et al. 2020), not outcomes. Consistent with greenwashing, Raghunandan and Rajgopal (2021) show that Business Roundtable signatories that claim to be socially responsible are in fact more likely to violate environmental and labor regulations. To the extent that firms engaging in greenwashing are more likely to emphasize short-term goals, we expect such firms to pollute more in general (despite their high E ratings) and to pollute even more when meeting earnings benchmarks.Footnote 23

3 Data, variable measurement, and sample

3.1 Data and variable measurement

We collect plant-level data on toxic releases for different chemicals from the EPA’s TRI database. Our dependent variable is the total weight of toxic release (Toxic Release) (e.g., Akey and Appel 2021; Kim and Xu 2022). We first calculate Toxic Release at the plant level as the sum of the amount (in thousands of pounds) of all toxic chemicals released to air, water, and ground by a plant each year (Appendix B contains variable definitions). To calculate Toxic Release at the firm level, we aggregate Toxic Release across all plants owned by that firm. We assign an equal weight to each chemical for our primary measure of Toxic Release. Given that chemicals differ in toxicity, we also compute a toxicity-weighted sum of chemical releases (Toxicity-weighted Release). To do so, we use the toxicity index from the EPA’s RSEI program, which describes each chemical’s toxicity relative to other TRI-reported chemicals in terms of their chronic human health effects.Footnote 24 Following RSEI, we calculate the product of the release amounts (in thousands of pounds) and the toxicity index for each chemical, then aggregate it across all chemicals released by a plant-year or firm-year to obtain Toxicity-weighted Release.Footnote 25

Following recent studies of real earnings management (Caskey and Ozel 2017; Raghunandan 2021), we measure our primary explanatory variable—managers’ efforts to meet earnings benchmarks—as the realized incidence of meeting or just beating consensus analyst EPS forecasts. We define firm-years as suspected of managing earnings to meet benchmarks (Suspect) if the actual EPS reported by I/B/E/S minus the latest consensus forecast is between zero and two cents.Footnote 26 We construct the latest consensus forecast as the average of all analysts’ most recent forecasts issued within the [−90, −4] day window prior to the earnings announcement date. Using more recent forecasts reduces the distortion due to stale forecasts that are unlikely to be included in the targets relevant to managers. To assess the robustness of our findings, we consider alternative measures of consensus forecasts, including the average of all analysts’ most recent forecasts issued within a longer [−180, −4] window prior to the earnings announcement date, as well as the most recent consensus forecast (taken from the I/B/E/S summary files) before the earnings announcement date. We collect firm-level accounting data from Compustat and stock market data from CRSP.

We follow the sustainability/ESG literature and measure E-S with E ratings (Gillan et al. 2021). We use data from MSCI ESG KLD STATS to construct E ratings of firms’ ongoing efforts to reduce their environmental impact. This database relies on analysts that assess firms on a wide array of performance indicators pertaining to inputs and outcomes on environmental, social, and governance factors (Chatterji et al. 2009; Cao et al. 2021). MSCI scans public databases for ESG information and updates performance indicator values each year (Krüger 2015). It classifies these performance indicators into seven broad categories: environment, community, diversity, employee relations, human rights, product, and corporate governance. Within each category, MSCI further divides the performance indicators into strengths and concerns, which roughly correspond to inputs and outcomes, respectively.

Krüger (2015) shows that the ESG ratings from MSCI are highly positively autocorrelated over time. For example, an environmental accident generates an environmental concern that remains for several years thereafter. This stickiness implies that ESG ratings reflect the “stock” of a firm’s environmental and societal attributes, which suggests that the ratings proxy well for firms’ strategy on sustainability.

Regarding coverage, MSCI contains yearly ESG ratings for 650 to 3,000 of the largest U.S. companies and has been widely used in the literature (e.g., Lins et al. 2017). It has two major advantages relative to other ESG databases such as Refinitiv (formerly ASSET4): it covers more U.S. firms over our sample period, and it provides information about outcomes on specific issues in a standardized format (rather than merely indicating the presence or absence of disclosure) (Khan et al. 2016).

To build our primary E rating, we use all indicators from MSCI’s environment category to calculate a net environmental score (E Score (Net)), equal to the number of E strengths minus the number of E concerns. We adjust E Score (Net) in two ways to make it more relevant for tests of Hypothesis 2. First, the total number of E indicators rated changes over time, making E Score (Net) less comparable over time. To reduce this source of measurement error, we refine E Score (Net) so that it is based only on E indicators with sufficiently long time-series data (ten or more years). We provide the list of these E indicators, along with their definitions, in Table IA.1 of the Internet Appendix.Footnote 27

Second, MSCI attempts to cover a comprehensive set of environmental issues, not all of which are directly related to toxic releases generated by production processes. Given our focus on releases generated by production processes, when testing Hypothesis 2 we create two partitions: those related to and those unrelated to toxic releases generated by production processes. The first group includes pollution prevention (Env-Str-B), recycling (Env-Str-C), hazardous waste (Env-Con-A), and regulatory compliance (Env-Con-B). The second group includes manufacturer of beneficial products and services (Env-Str-A), communications (Env-Str-E), and manufacturer of agricultural chemicals (Env-Con-E). We calculate a net environmental score—TR-E Score (Net) and NTR-E Score (Net) for the first and second partition, respectively—equal to the number of strengths minus the number of concerns (see Appendix B for details). We expect TR-E Score (Net) to be more relevant for testing Hypothesis 2, whereas NTR-E Score (Net) serves as a placebo used in falsification tests.

3.2 Sample selection and descriptive statistics

We first collect all plants reporting to TRI that can be linked to firms in the Compustat/CRSP merged database over the 1994 to 2018 period. We use the mapping table from Harvard Dataverse shared by Xiong and Png (2019) to link a plant in TRI to the firm in Compustat/CRSP that owns the plant. We restrict our sample to firm-years after 1994 because of the documented inconsistency between actual and forecasted EPS in I/B/E/S prior to 1994 (e.g., Clement and Tse 2003; Cohen et al. 2007; Kirk et al. 2014). This restriction does not reduce sample size much, as we find little overlap between TRI and Compustat/CRSP before 1994.

Following Akey and Appel (2021) and Kim and Xu (2022), we drop plant-year observations reporting zero values for Toxic Release.Footnote 28 We restrict our sample to firms that have December fiscal year ends to align financial data with TRI data, which are reported by calendar year. We also require observations in our sample to have non-missing values for all control variables used in our regression (1) discussed below. After imposing these data requirements, our final sample includes 39,090 plant-year observations for 5,149 firm-years, representing 4,518 plants for 559 unique firms. We winsorize all non-indicator variables at the 1st and 99th percentiles of their pooled distributions.

Table 1, Panel A describes the industry composition of our sample at the plant-year level, sorted by the number of observations. Consistent with our anecdotal understanding of across-industry variation in pollution levels, chemical & allied products, transportation equipment, and industrial machinery & equipment have the largest number of plant-years discharging toxic chemicals. They are followed by primary metal industries, paper & allied products, petroleum & coal products, fabricated metal products, and electronic & other electrical equipment.

Table 1 Summary statistics

Panels B and C in Table 1 provide summary statistics for plant- and firm-level variables. Each year, the average plant releases 119.84 thousand pounds of toxic chemicals (Toxic Release) and generates 0.10 thousand pounds of nonroutine toxic release (Nonroutine Release), with the latter generally resulting from catastrophic events or accidents. We delay to later sections the discussion of plant-level variables capturing enforcement track records (Past Enforcement Count and Past Settlement Amount) and production levels (Prod Ratio). At the firm level, around 12% of observations in our sample are classified as firms meeting benchmarks based on Suspect. The average E Score (Net) based on E indicators with at least ten years of MSCI coverage over our sample period is 0.24.Footnote 29 The distributions of firm-level control variables (discussed next in Section 4) are comparable to those reported in Caskey and Ozel (2017).

4 Relation between toxic release and meeting earnings benchmarks

4.1 Baseline results for hypothesis 1

To test the relation between meeting earnings benchmarks and toxic releases outlined in Hypothesis 1, we estimate the following regression at both the firm and plant levels.

$$ Ln\left(1+ Toxic\ Release\right)= Suspect+ Controls+ Firm\ or\ Plant\ FE+ Industry\hbox{--} by\hbox{--} year\ FE+\varepsilon $$

Given that Suspect is a firm-level reporting decision, we focus on firm-level analyses for Eq. (1). We also conduct plant-level analyses to confirm that the firm-level results are not due to a few plants with large toxic releases. Finding similar results across plants suggests that the decision to boost earnings is taken at the firm level and then communicated to plants. Plant-level analyses also allow us to examine where pollution occurs within a firm, which enables investigation of predictions about across-plant variation in pollution.Footnote 30 As Toxic Release is right-skewed, we follow prior studies (e.g., Akey and Appel 2021; Chatterji et al. 2009; Kim and Xu 2022) and use the natural logarithm of one plus Toxic Release as the dependent variable in regression (1).Footnote 31 We also follow prior studies and control for the normal level of toxic release by including various firm characteristics that reflect normal production levels. These include the natural logarithm of firm size measured by total assets (Assets); natural logarithm of the ratio of total revenue to total assets (Turnover); property, plant, and equipment scaled by total assets (PPE/Assets); capital expenditures scaled by total assets (CapEx/Assets); market-to-book ratio (M/B); and book leverage (Leverage). We also include nonroutine release (Nonroutine Release) to control for the effects of one-time pollution events.

Following prior studies (e.g., Greenstone 2002; Akey and Appel 2021; Tomar 2019), we include firm (plant) fixed effects and industry-by-year fixed effects in firm-level (plant-level) regressions. The firm/plant fixed effects purge the estimated effects of meeting benchmarks of all persistent firm/plant characteristics that determine toxic releases. The industry-by-year fixed effects remove all transitory differences in toxic releases across industries. As Greenstone (2002) explains, these fixed effects are important because macroeconomic changes over time (e.g., oil crises, recessions, and increases in foreign competition) have different effects on different manufacturing industries. While TRI provides data on 334 toxic chemicals, the mean (median) number of chemicals reported is 20.9 (13) at the firm level and 7.5 (5) at the plant level. We cluster all standard errors at the firm level and present significance for two-tailed tests.

Panel A of Table 2 presents firm and plant-level estimates of regression (1) in columns (1) and (2), respectively. As we include firm (plant) fixed effects, the coefficients on Suspect capture incremental toxic releases for suspect firm-years relative to non-suspect firm-years for the same firm (plant). Consistent with Hypothesis 1, we find significantly higher toxic releases for Suspect firm-years in both columns of Table 2, Panel A. The magnitude of the coefficient on Suspect is larger in the firm-level regression than in the plant-level regression because firm-level pollution combines pollution across plants for each firm.

Table 2 The effect of meeting earnings benchmarks on toxic release

In terms of economic magnitude, the estimates from the firm-level regression reported in column (1) suggest that real earnings management to meet benchmarks leads to about a 15% increase in toxic release relative to the mean (median) level of Toxic Release reported in Table 1, Panel C.Footnote 32 In addition to confirming H1, this evidence is consistent with our rationale for H1: pollution increases when managers cut pollution abatement costs to boost earnings in order to meet earnings benchmarks.Footnote 33 Regarding control variables, the coefficients on Assets and Turnover are positive and significant, consistent with both variables capturing total assets and sales (two measures of scale). Nonroutine Release is also positively associated with Toxic Release at the plant level, suggesting that pollution due to one-time events should be controlled for.

As discussed in Section 2.1.1, two TRI toxins—ozone and lead—are subject to real time monitoring because the EPA includes them among the six criteria air pollutants. When meeting earnings benchmarks, firms are less likely to increase releases of these two toxins, relative to other TRI toxins, because the costs of increased pollution are considerably higher. To shed light on this issue, we re-estimate regression (1) separately for released amounts of ozone and lead. Consistent with the costs of pollution being higher for ozone and lead, the estimates for Suspect in columns (1) and (3) of Table 2, Panel B are insignificant.Footnote 34 In contrast, Suspect continues to be significantly positively associated with the released amounts of other TRI toxins, as shown in columns (2) and (4) of Table 2, Panel B. The different results observed for ozone and lead confirm that the costs of increased pollution are considerably higher for those two toxins—sufficiently high to deter managers from cutting back on their pollution control costs to meet earnings goals.

4.2 Robustness of baseline results

In this section, we examine if the main result in Table 2—our finding that toxic releases increase during Suspect firm-years—is robust to alternative measures of toxic release and Suspect.

4.2.1 Alternative measures of toxic release

We examine the effect of meeting benchmarks for two alternative measures of toxic release. The first measure is toxicity-weighted chemical release (Toxicity-Weighted Release), which accounts for variation in the chronic health effects of chemicals reported to TRI. The second measure is Toxic Release scaled by revenues (Toxic Release/Sales), which allows for potential non-linearity in the relation between toxic release and production levels.Footnote 35 We replace Toxic Release in regression (1) with Toxicity-Weighted Release and Toxic Release/Sales and report the regression results in Table 3, Panel A.

Table 3 Robustness analyses: effect of meeting earnings benchmarks on toxic release

We continue to find a significant positive correlation between Suspect and Toxicity-Weighted Release, suggesting that the effect of meeting benchmarks documented in Table 2 is not mainly driven by chemicals that are relatively benign. Similarly, Suspect and Toxic Release/Sales are also significantly positively associated, suggesting that our finding is unlikely to be due to insufficient control for scale. To investigate the potential concern that our results so far are driven by a few chemicals in a handful of industries, we repeat the analyses separately for a subset of 20 chemicals with the largest release quantities in our sample, as well as for the remaining chemicals that are not as prevalent. Our results in Table IA.3 of the Internet Appendix indicate significant positive slopes on Suspect for both subgroups.

We also examine potential variation in the effect of meeting benchmarks on toxic release across pollution media: air emissions, water discharge, and solid waste disposal. Table IA.4 of the Internet Appendix shows that the effect of meeting earnings benchmarks is concentrated in air emissions and is absent for water discharge and solid waste disposal. One potential explanation for this finding is that toxic air emissions may be less salient (hard to trace and not as visible) than toxic water discharges and ground disposal.

4.2.2 Alternative measures of consensus analyst forecasts used to construct suspect

We next consider different measures of consensus forecasts. In Table 3, Panel B, we redefine Suspect using the average of each analyst’s latest forecast issued within a wider [−180, −4] day window prior to the earnings announcement date (Suspect (180)). Like Suspect, which is based on forecasts issued within the [−90, −4] window, Suspect (180) is significantly positively associated with Toxic Release.

As discussed earlier, one assumption motivating the use of Suspect to capture earnings management is that managers manipulate real activities during the year based on their expectations of both performance during the year and the final analyst consensus against which actual EPS is compared (benchmark). Managers likely ignore consensus forecasts available earlier in the year because the forecasts are generally biased upward and are based on less information than is available to managers. In effect, those earlier consensus forecasts measure managers’ expectations of the final consensus with error, and that measurement error is larger earlier in the year. When consensus forecasts are used to compute Suspect, the relation with toxic release should therefore become successively weaker with earlier forecast dates.

To test the above prediction, we collect the consensus forecast of annual EPS (constructed by I/B/E/S in its summary files) at the end of each fiscal quarter as well as just before the earnings announcement, and redefine Suspect relative to each of these consensus forecasts. Consistent with earlier forecasts being a noisier proxy for the final consensus managers expect during the year, the results in Table 3, Panel C show that the relation between Suspect and Toxic Release weakens monotonically as the date of the consensus moves back in time from the month right before earnings announcement to the end of the first fiscal quarter. The relation is equally strong and significant for the month prior to the earnings announcement and the end of Q4, but declines in magnitude and significance for consensus forecasts available as of the end of Q3, Q2, and Q1. Note that these results have no implications for the timing of the release of toxins, as the pollution data we use only provide the total amount released over the year.

Our final analysis in Table 3 examines the pattern of pollution levels across different levels of forecast error. Table 2 shows that pollution is higher if forecast errors lie between 0 and 2 cents, relative to the average for all other forecast errors combined. However, hypothesis H1 makes a sharper prediction: pollution levels should increase only for the narrow range of forecast errors corresponding to meet or just beat. The results in Table 2 are also consistent with other patterns though. For example, pollution increases nonlinearly with performance: pollution levels for meet/just beat are considerably higher than those for negative forecast error but only slightly lower than those for large positive forecast errors.

We construct three other indicator variables based on forecast error relative to the latest consensus forecast used to measure Suspect: Large Miss equals one if the forecast error is more negative than −3 cents, as in Caskey and Ozel (2017); Beat 2-5cents equals one if the forecast error is more than 2 cents but no more than 5 cents; Large Beat equals one if the forecast error is more than 5 cents. We add Large Miss, Beat 2-5cents, and Large Beat as three additional explanatory variables to regression (1). In effect, these indicator variables, along with Suspect, describe pollution levels for the respective ranges relative to the level for a “just miss” base group with negative forecast error higher than −3 cents.

Results reported in Panel D of Table 3 show that only the coefficient on Suspect is significantly positive, and the coefficients on Large Miss, Beat 2-5cents, and Large Beat are all insignificant. While firms that meet or just beat consensus forecasts release more toxic chemicals than the base group that just misses the benchmark, firms that substantially miss the benchmark or comfortably beat the benchmark do not. The clear pattern observed in these results is inconsistent with the possibility that pollution levels increase with forecast error (better unexpected performance).

5 The moderating effect of environmental sustainability

To test Hypothesis 2 (H2) regarding the moderating effect of a firm’s E rating on the relation between meeting benchmarks and toxic releases, we estimate the following regression:

$$ Ln\left(1+ Toxic\ Release\right)= Suspect+ Suspect\times E\ Sco re\ (Net)+E\ Sco\mathrm{r}e\ (Net)+ Controls+ Firm\ FE+ Industry\hbox{--} by\hbox{--} year\ FE+\varepsilon $$

where E Score (Net), our primary measure of E rating, reflects the net effects of the strengths and concerns in a firm’s environmental indicators taken from MSCI. We estimate regression (2) on firm-year observations, as E indicators are only available at the firm level. H2 predicts a negative coefficient on the interaction term Suspect × E Score (Net).

Panel A of Table 4 presents estimates from regression (2). Results in Column (1) are for E Score (Net) based on all E indicators available, and the results in Column (3) are for E Score (Net) based on indicators with at least ten years of MSCI data. As discussed in Section 3.1, E Score (Net) in Column (3) is more comparable over time because it excludes E indicators with sparse coverage. Contrary to H2, we find a significantly positive coefficient on the interaction term Suspect × E Score (Net) in both columns (1) and (3). This finding suggests that firms with higher E ratings—i.e., firms rated by MSCI as being more environmentally sustainable—increase pollution more, rather than less, than firms with low E ratings when meeting earnings benchmarks.

Table 4 The moderating effect of firm-level environmental ratings

To further confirm the unexpected positive coefficient on Suspect × E Score (Net), we use the S&G components of ESG indicators as a placebo test. Given that the dependent variable is toxic releases, S&G ratings should have little incremental explanatory content over E ratings. We construct a corresponding social and governance net score (S&G Score (Net)) by subtracting the number of concerns for community, diversity, employee relations, human rights, product, and corporate governance categories from the number of strengths for these categories. We add S&G Score (Net) and Suspect × S&G Score (Net) as additional explanatory variables to regression (2) and report the results in columns (2) and (4) of Table 4, Panel A. If the coefficient on Suspect × E Score (Net) reflects some general features of ESG ratings that are unrelated to E-S, we expect to see similar results for S&G Score (Net). If, however, the relation between pollution and meeting benchmarks is solely a function of E ratings, the coefficient on the interaction with S&G Score (Net) should not load. We find the interaction between Suspect and S&G Score (Net) is indeed insignificant in columns (2) and (4).Footnote 36

To provide additional evidence on the role of E ratings in the positive relation between toxic release and meeting earnings benchmarks, we decompose E Score (Net) in columns (3) and (4) of Panel A into TR-E Score (Net) and NTR-E Score (Net), reflecting factors that are and are not directly related, respectively, to toxic releases generated by production processes. We expect the moderating effect of E Score (Net) on the link between Suspect and Toxic Release to be driven by TR-E Score (Net). The components of the E rating that are unrelated to production processes, NTR-E Score (Net), should be less relevant for the toxic releases that we examine here. We re-estimate regression (2) by replacing E Score (Net) with TR-E Score (Net) and NTR-E Score (Net) and report the results in column (1) of Panel B. Consistent with our expectation, the interaction between Suspect and TR-E Score (Net) is significantly positive, but the interaction between Suspect and NTR-E Score (Net) is insignificant.

We confirm that this result for the net effect of factors that are directly related to toxic releases generated by production processes is also observed separately for strengths and concerns. We further decompose TR-E Score (Net) into TR-E Score (Strength) and TR-E Score (Concern), which are defined as the number of strengths and concerns, respectively, among E indicators directly related to toxic releases generated by production processes. By definition, TR-E Score (Net) = TR-E Score (Strength)TR-E Score (Concern). The results reported in Column (2) of Table 4, Panel B suggest that both TR-E Score (Concern) and TR-E Score (Strength) have a significant moderating effect on the link between Suspect and Toxic Release.

We consider alternative explanations for our surprising result relating to H2. Why do firms with higher E ratings, especially for indicators directly related to toxic releases from production processes, pollute more when facing short-term earnings pressure? The explanation that is most consistent with results from the additional analyses we conduct is based on firms with high E ratings building regulatory and reputational “pollution slack,” which allows them to increase pollution at a lower cost when it is needed to boost earnings. This slack could result from more emission allowances (generated by lower toxic releases in the past), which allow firms to temporarily increase pollution without exceeding regulatory limits. The slack could also arise from a better reputation, built over time among stakeholders (e.g., consumers, communities, and investors), that excuses the firms for temporary increases in pollution.Footnote 37

One key assumption underlying our pollution slack explanation is that the MSCI data provide reliable E-S measures: higher E ratings reflect better environmental track records on toxic releases. To test this assumption, we construct the following measures of environmental track records using EPA data: toxic releases (Toxic Release) over the past two years; the number of EPA enforcement cases on plants owned by the firm over the past five years (Past Enforcement Count); and the dollar amount of settlements for all EPA enforcement cases on plants owned by the firm over the past five years (Past Settlement Amount). Higher values of these measures indicate worse environmental track records. We regress each of these measures on TR-E Score (Strength), TR-E Score (Concern), NTR-E Score (Strength), and NTR-E Score (Concern) to determine if these four measures of E ratings reflect environmental track records.Footnote 38

Table 5 presents the regression results for environmental track records. Consistent with our assumption, TR-E Score (Concern) is significantly positively associated with all four track record measures. TR-E Score (Strength) is significantly negatively associated with Toxic Release in the past two years, but not significantly related to past enforcements and settlements. In contrast, NTR-E Score (Strength) and NTR-E Score (Concern) are generally unrelated to the track record measures. These findings indicate that the E ratings based on MSCI indicators directly related to toxic releases capture reasonably well the track records of firms’ toxic releases. They also suggest that MSCI relies on EPA data to construct its E indicators.

Table 5 Environmental ratings and environmental impact track records

As a final test of the pollution slack explanation, we examine within-industry-year and within-firm-year variation in plant-level measures of track records for toxic releases. Exploiting within-firm-year variation is a tighter specification that asks the following question: When firms decide to increase pollution to meet earnings benchmarks, are they more likely to increase pollution in plants with better toxic release track records (i.e., larger pollution slack)? We construct plant-level versions of Past Enforcement Count and Past Settlement Amount based on enforcement cases for each plant. We also construct a dummy variable, High Past Pollution, that equals one (zero) if a plant’s toxic release in the previous year is higher (lower) than the sample median for that year. We modify regression (2) by replacing E Score (Net) with each of these three plant-level track record variables and estimate it at the plant level with plant fixed effects.

Panel A of Table 6 presents the results for High Past Pollution, and Panel B presents results for Past Enforcement Count and Past Settlement Amount. The left (right) column in each subgroup includes industry-year (firm-year) fixed effects. The coefficients on the interaction term are all negative and significant (at the 10% level or better). These results confirm that plants with better (worse) track records of toxic release, relative to other plants in that industry-year or firm-year, are more (less) likely to cut pollution abatement costs so that the firm can meet earnings benchmarks. The significant results observed for the tighter specification with firm-year fixed effects are consistent with the pollution slack explanation, because firms leverage where the slack resides when deciding to cut pollution abatement costs.Footnote 39

Table 6 The moderating effect of plant-level environmental track record

Our results suggest the following description: while high E rating firms pollute less in general than low E rating firms, they pollute more relative to their historical averages when it is needed to meet earnings benchmarks. This finding contradicts the view that environmental sustainability curbs managerial short-termism. It is also inconsistent with the greenwashing explanation, which holds that firms that obtain high E ratings by exaggerating their environmental efforts also pollute more when meeting earnings benchmarks. Under the greenwashing explanation, firms with high E ratings should pollute more in general, as indicated by worse environmental track records.Footnote 40 Our results are consistent, however, with a more nuanced view: superior environmental track records create pollution slack that reduces the costs of increased pollution to meet earnings benchmarks.Footnote 41 The results also suggest that the E ratings and their component indicators obtained from MSCI are reliable measures of firm efforts to control pollution related to TRI toxins.

6 Additional analyses

6.1 The moderating effect of environmentally sustainable investing

As a potential driver of the increasing demand for corporate sustainability, the practice of sustainable investing (SI), in which investors select stocks rated high on ESG, has experienced unprecedented growth in the US and Europe (Grewal and Serafeim 2020; Gillan et al. 2021). How then does SI affect the relation between meeting benchmarks and toxic release? On the one hand, SI takes a long-term view and discourages managerial short-termism by deemphasizing quarterly earnings (Starks et al. 2020; Cao et al. 2021). The view that SI curbs short-termism predicts a weaker link between pollution and meeting benchmarks for firms with higher institutional ownership by funds that are focused on E-S (E-S IO).

On the other hand, institutions that are focused on E-S likely screen and select stocks based on E ratings and environmental track records (Amel-Zadeh and Serafeim 2018; Curtis et al. 2021; Heath et al. 2021). And we find that firms with high E ratings and strong environmental track records are more likely to pollute more to boost earnings when meeting earnings benchmarks. If so, we expect to find a stronger link between meeting benchmarks and pollution for high E-S IO firms. For convenience, we label this opposing view the “pollution slack view.”

To test these competing views, we measure E-S IO following the methodology of Cao et al. (2021) and Brandon et al. (2021). We first identify E-S institutional investors among 13F institutions based on the E-S footprint of the firms they invest in. We then calculate, for each sample firm, the percentage of institutional ownership held by these E-S investors (see Appendix B for details). To estimate the moderating effect of E-S IO, we replace E Score (Net) in regression (2) with E-S IO. We estimate this modified regression at the firm level, as E-S IO is defined at the firm-year level. The results presented in column (1) of Table 7 indicate a significantly positive coefficient on the interaction term Suspect × E-S IO, consistent with the pollution slack view.

Table 7 The moderating effect of environmentally sustainable institutional holding

Similar to Table 4, we confirm this inference with a placebo test that is based on a measure of sustainable institutional ownership derived from social & governance indicators (S&G-S IO) and calculated similarly to E-S IO. We expect weaker results for S&G-S IO because it reflects other priorities, separate from E-S. Column (2) presents results for the regression with S&G-S IO and Suspect × S&G-S IO as two additional explanatory variables. Again, consistent with the pollution slack view and with E ratings being more relevant than S&G ratings, we find an insignificant coefficient on Suspect × S&G-S IO.

6.2 Relation between meeting benchmarks and SO2 emission

Liu et al. (2021) show that meeting analyst forecasts is associated with higher SO2 emission for Chinese SOEs controlled by the central government but not for locally controlled SOEs or non-SOE firms. Given that our sample of U.S. firms are closest to non-SOE firms in China, we investigate whether the firms in our sample, which increase their toxic releases to meet earnings benchmarks, also increase their SO2 emissions. As SO2 is one of the six criteria air pollutants highlighted by the EPA, we do not expect our firms to increase SO2 emissions, because SO2 is considerably more hazardous and more closely monitored than TRI toxins. Evidence consistent with this prediction has already been shown in Table 2 for ozone and lead, two other criteria pollutants.

To test our prediction, we collect annual plant-level data on SO2 emissions from the EPA’s Clean Air Markets Division (CAMD) and National Emissions Inventory (NEI) databases (Shive and Forster 2020). Panel A of Table 8 presents descriptive statistics. The mean (median) SO2 emission at the plant level is 591.17 (0.36) thousand pounds in our sample, indicating considerable right skewness in SO2 emission. We re-estimate regression (1) for SO2 emission and report the results in Panel B of Table 8. Consistent with our prediction, and consistent with the results for non-SOE Chinese firms in Liu et al. (2021), Suspect is not associated with SO2 emissions. This finding is also consistent with the inference in Liu et al. (2021) that the positive relation they find between meeting benchmarks and SO2 emission might be unique to Chinese central SOEs because they face low regulatory costs, given that they are effectively exempt from environmental regulation and enforcement.

Table 8 The effect of meeting benchmarks on sulfur dioxide emissions

6.3 Other ways to meet earnings benchmarks by managing forecast errors

We investigate whether firms that cut pollution abatement costs also engage in other ways to manage the forecast errors underlying earnings benchmarks. Forecast errors can be altered by managing earnings and guiding forecasts (Dechow et al. 2010). Earnings management can be achieved by altering activities (REM) and by altering accruals (accruals-based earnings management or AEM). Our results are provided below.

First, we examine whether our sample firms use four other REM methods to boost earnings to meet benchmarks: overproduction (Roychowdhury 2006), cutting SG&A expense (Roychowdhury 2006), cutting R&D expense (Baber et al. 1991; Bushee 1998), and cutting advertising expense (Cohen et al. 2010). To do so, we re-estimate regression (1) by replacing the dependent variable Toxic Release with two measures of production (Prod Ratio and Production), SG&A expense (SG&A/Sales), R&D expense (R&D/Sales), and advertising expense (AD/Sales).Footnote 42Prod Ratio is based on actual plant-level production data from the EPA’s Pollution Prevention (P2) database. The P2 database provides the ratio of this year’s output or outcome of processes in which chemicals are used (Akey and Appel 2021) to the corresponding value for last year.Footnote 43 We measure Prod Ratio at the plant level as the average of the EPA’s production ratios across all chemicals released that year from that plant. Production is a firm-level estimate of production levels obtained from financial reports, defined as the sum of cost of goods sold and change in inventory scaled by lagged total assets (Dechow et al. 1998; Roychowdhury 2006). We believe that Prod Ratio is more reliable than Production because it directly measures production. SG&A/Sales, R&D/Sales, and AD/Sales are also obtained from firm-level financial reports.

The results reported in Table 9 indicate that the coefficient on Suspect is insignificant across columns (1)–(5), suggesting that our sample firms do not use other types of REM to meet earnings benchmarks. The insignificant coefficient estimates on both production variables are important because the overproduction explanation (e.g., Roychowdhury 2006) is particularly relevant here: overproduction increases pollution and product costs are substantial for our sample of manufacturing firms.

Table 9 The effect of meeting benchmarks on other types of real earnings management

To confirm further that the increased pollution comes from cutting variable pollution abatement costs, we examine two implications. One implication is that the cost of goods sold should decline for Suspect firm-years. To test this prediction, we replace the dependent variable in regression (1) with sales-deflated cost of goods sold (COGS/Sales). The regression results, in column (6) of Table 9, confirm that COGS/Sales is significantly lower for Suspect firm-years.Footnote 44 Declines in COGS/Sales without corresponding declines in production are consistent with suspect firms cutting back on pollution abatement costs. Finding that our sample firms cut pollution control costs without engaging in other forms of REM might be partially due to pollution abatement costs being substantial relative to other costs, such that cutting those costs has the potential to alter per share earnings sufficiently to meet benchmarks. Regardless, we acknowledge that our findings might be sample-specific and not generalize to other firms.

The other implication is that cutting pollution abatement costs to boost short-term earnings is more feasible when those costs are mostly variable. To proxy for the variable/fixed nature of pollution abatement costs, we use the approach taken in the management accounting literature to estimate the fixed/variable nature of firm-level costs (e.g., Banker et al. 2014). Consistent with our prediction, the results in Table IA.6 of the Internet Appendix suggest that the positive link between toxic release and meeting earnings benchmarks is concentrated (absent) in firms with cost variability higher (lower) than the industry-year median.

Turning to accruals-based earnings management, we expect a stronger link between pollution and meeting the earnings benchmark for firms with limited opportunity to manage accruals, as indicated by balance sheets that are bloated by accumulated prior efforts to boost earnings via accruals (Barton and Simko 2002). Consistent with this prediction, we show in Table IA.7 of the Internet Appendix that the positive association between Suspect and Toxic Release is stronger among firms with higher net operating assets, a proxy for reduced capacity to use accruals management.

Regarding expectation management, we fail to find a moderating effect of downward expectation management—measured by downward revision of management forecasts and downward guidance of the consensus analyst forecast over the year—on the positive relation between Suspect and Toxic Release (see Table IA.8 of the Internet Appendix).

7 Conclusion

We examine two key pillars supporting the increasing demand for corporate environmental sustainability as a market-based solution to environmental externalities. First, managerial short-termism exacerbates negative externalities. Second, environmental sustainability curbs managerial short-termism and the resulting negative environmental impact.

We test the first proposition by using earnings management to meet earnings benchmarks as a measure of managerial short-termism, and environmental pollution as a measure of its negative externality. Consistent with this proposition, we find that firms attempting to meet earnings benchmarks increase significantly their release of toxic chemicals. This finding suggests that firms cut pollution abatement costs to boost earnings, as the short-term benefits of meeting earnings expectations exceed the long-term costs of higher pollution.

We test the second proposition by examining if the positive relation between pollution and meeting earnings benchmarks is attenuated for firms with higher environmental ratings. To our surprise, our evidence contradicts this proposition: firms that are viewed as more environmentally sustainable are in fact more, not less, likely to cut pollution abatement costs to meet earnings benchmarks. We show, however, that firms with higher E ratings have better environmental track records. This suggests an alternative explanation: better environmental track records create slack that lowers the costs of occasionally increasing pollution to boost earnings when needed to meet earnings benchmarks.

Our first set of results suggest that earnings management not only adversely affects stakeholders within the boundaries of the firm but also impacts society at large. This finding contributes to the accounting literature on real earnings management as well as the economics literature on the determinants of environmental pollution. Our second set of results contribute to the literature on environmental sustainability and sustainable investing by questioning the view that environmental sustainability curbs short-termism. Firms with better environmental track records and firms held by more institutional investors that are focused on environmental sustainability are more likely to increase pollution to achieve short-term earnings benchmarks.

As is typical, some caveats are in order. First, data availability limits our ability to tie the pollution–meeting earnings benchmark link to cuts in pollution abatement activities. To make that connection, we need access to data on the relative costs and benefits of abatement technologies, which include how easy it is to switch on and off abatement processes, cost savings from cutting pollution abatement, and costs from increased pollution. This data limitation prevents us from conducting cross-sectional tests based on abatement technologies for our first research question. Second, while we find evidence consistent with the pollution slack explanation for our second research question, we acknowledge that there could be other explanations.