Inferring Economic Impacts from a Program’s Physical Outcomes: An Application to Forest Protection in Thailand

Economists typically estimate the average treatment effect on the treated (ATT) when evaluating government programs. The economic interpretation of the ATT can be ambiguous when program outcomes are measured in purely physical terms, as they often are in evaluations of environmental programs (e.g., avoided deforestation). This paper presents an approach for inferring economic impacts from physical outcomes when the ATT is estimated using propensity-score matching. For the case of forest protection, we show that a protection program’s ex post economic impact, as perceived by the government agency responsible for protection decisions, can be proxied by a weighted ATT, with the weights derived from the propensity of being treated (i.e., protected). We apply this new metric to mangrove protection in Thailand during 1987–2000. We find that the government’s protection program avoided the loss of 12.8% of the economic value associated with the protected mangrove area. This estimate is about a quarter smaller than the conventional ATT for avoided deforestation, 17.3 percentage points. The difference between the two measures indicates that the program tended to be less effective at reducing deforestation in locations where the government perceived the net benefits of protection as being greater, which is the opposite of the relationship that would characterize a maximally effective program.


Introduction
Economists routinely use treatment effects models to evaluate government programs' causal impacts on observed outcomes (Imbens and Wooldridge 2009;Athey and Imbens 2017). Identifying causal relationships requires addressing various potential sources of bias, including non-random selection, confounding, and spillover effects. Economists have developed a variety of quasi-experimental impact evaluation methods to address these biases, including matching, instrumental variables, regression discontinuity, and difference-in-differences. These methods implicitly assume that the economic outcomes of interest are observable. In some settings, however, economists only observe outcomes measured in physical terms, which might lack well-defined economic interpretations. Although impact evaluations using physical outcomes might be feasible, their results do not necessarily serve as useful proxies for programs' economic impacts. This paper presents a method for inferring the economic impact of a program when outcomes are observed in physical terms.
We present this method in the context of forest conservation and demonstrate its empirical application by evaluating a forest zoning system in Thailand that aimed at protecting mangrove forests. Mangroves are coastal ecosystems located in the tropics and subtropics that provide a wide range of benefits to local and global populations. In a series of papers, Mäler and colleagues listed these benefits as including the production of shellfish, firewood, timber, and game; feeding and nursery areas for commercially important fish; shoreline and storm protection; nesting, resting, and feeding sites for birds; and carbon sequestration (Mäler et al. 1996(Mäler et al. , 2008(Mäler et al. , 2009Arrow et al. 2000). A large literature confirms the economic importance of these benefits (Himes-Cornell et al. 2018), including in Thailand (Barbier 2007). 1 Private land owners, however, often have incentives to convert forests, including mangroves, to more profitable land uses. Since the free market likely fails to supply the socially optimal amount of forest, governments often intervene by implementing forest conservation programs, including protected areas, payments for ecosystem services, and community-based forest management. Most impact evaluations of these programs use forest cover as the outcome variable and aim to measure a program's causal effect on avoided deforestation (Blackman 2013;Lan and Yin 2017). An impact evaluation of this type defines a program's average treatment effect on the treated (ATT) as the difference between the realized forest cover change in treated areas-i.e., sites where the program was implemented-and the counterfactual forest cover change that would have occurred in the same areas had they not been treated.
The ATT for avoided deforestation is a good proxy for a forest conservation program's economic impact only if the benefits and costs of conservation are homogeneous (Vincent 2016). Environmental economics theory has recognized that environmental values can instead vary spatially and that government programs to protect the environment must account for this variation since at least Mäler's (1974) seminal treatise. There is abundant evidence that both the benefits (e.g., carbon sequestration, Asner et al. 2010; biodiversity habitat, Gibson et al. 2011;Le Saout et al. 2013; watershed services, Pattanayak and Kramer 2001;Brauman et al. 2007) and the costs (Naidoo et al. 2006;Polasky 2008) of conservation are highly heterogeneous. Mäler et al. (2008;2009) emphasize that the economic benefits of maintaining mangroves are "very case sensitive" and can vary greatly by location due to differences in ecological, economic, and institutional conditions. A program that increases forest cover by, for instance, 20% thus does not necessarily increase forest value by 20% too. Depending on the distribution of benefits and costs across the sites included in a program, the ATT for avoided deforestation can either overestimate or underestimate the ATT for a program's economic impact.
We develop an indirect way to estimate the ATT for a forest conservation program's economic impact by building on work by Heckman (2010), who emphasized the structural framework embedded in treatment effects models. We employ a discrete-choice Roy (1951) model to describe the process of selection into treatment. One can view this model as a random utility model (RUM;McFadden 1973;Manski 1977) and parametrically estimate the effects of forest attributes on behavior in response to a conservation program. A byproduct of this procedure is the estimated propensity score. We show that under certain conditions, a transformation of the propensity score has a utility interpretation, where utility is as perceived by the government agency that decides which forests to include in the program. We use this transformation as a weight to construct a utility-weighted ATT. This new metric can be viewed as a generalized form of the ATT for avoided deforestation, and it offers a better measure of the ATT for a forest conservation program's economic impact, at least as perceived by the government.
Our method connects the impact evaluation literature to the non-market valuation literature. Propensity score matching is a common impact evaluation technique, and random utility models are widely applied in valuing non-market goods (Freeman et al. 2014). We show how to combine these two approaches and exploit their respective advantages. In doing so, we contribute to filling a gap in the environmental policy analysis literature, in which valuation and evaluation are disconnected (Ferraro et al. 2012).
Our empirical analysis focuses on mangrove protection in Thailand. In 1987, recognizing both the high value of Thailand's mangroves and their rapid loss (Charuppat and Charuppat 1997), the Thai government introduced a zoning system for managing the nation's mangroves (Havanond 2008). This system assigned mangroves to different categories that varied in terms of restrictions on forest use, including conversion to non-forest land uses. We evaluate the zoning system using both the ATT for avoided deforestation and the utility-weighted ATT that we develop, and we compare the results. The utility-weighted ATT indicates that the system avoided the loss of 12.8% of the economic value associated with the protected mangrove area. This impact is about a quarter smaller than the conventional ATT for avoided deforestation, 17.3 percentage points. The difference between the two measures indicates that the zoning system tended to be less effective at reducing deforestation in locations where the government perceived the net benefits of protection as being greater. This relationship is the opposite of what one would expect for a maximally effective program.
The rest of the paper is organized as follows. Section 2 presents the theoretical foundations and econometric aspects of our method. Section 3 applies the method to the mangrove zoning system in Thailand. The last section discusses the interpretation of the utility-weighted ATT and its limitations as an evaluation metric.

Deriving and Estimating a Utility-Weighted ATT
Impact evaluations are program-specific, and the empirical strategy one should use to identify program impacts depends on program features and the data structure. Although our proposed method can in principle be applied to non-environmental programs, to explain its derivation and estimation we refer to the evaluation of a hypothetical forest protection program, with propensity score matching as the evaluation strategy. We begin with an intuitive explanation of the method before presenting details on its derivation and estimation.

The Intuition for a Utility-Weighted ATT
Avoided deforestation (Q), a binary variable that equals 1 if deforestation does not occur (i.e., forest is retained) and 0 otherwise, is the most widely used outcome variable in the impact evaluation literature on forest protection programs. To estimate the ATT for avoided deforestation ( ATT Q ), economists first observe the realized forest outcomes for a sample of protected ("treated") sites, then use a technique such as matching to construct for each treated site a counterfactual (i.e., the hypothetical forest outcome had the site not been treated), and finally compute an unweighted average of the differences between the realized outcomes and the counterfactuals across all treated sites. In this approach, all treated sites have the same weight; they are not weighted by the utility gains from protecting them. Hence, the total amount of effective protection-protection that actually results in avoided deforestation-affects the ATT, but which sites are protected effectively does not. Table 1 presents a simple numerical example of the evaluation of a program that protects three sites. We consider three hypothetical cases in which protection is effective at only one of the sites. When all sites have the same weight, ATT Q equals one-third in each case, regardless of which site is protected effectively.
Even if the sites have the same area, the utility gains from protecting them could vary. We assume that effective protection generates the least gain at Site A and the greatest gain at Site C. We can incorporate these heterogeneous values into the evaluation by constructing a utility-weighted ATT ( ATT W ), which is lowest when only Site A is protected effectively (Case 1) and highest when only Site C is protected effectively (Case 3). Compared to the conventional evaluation metric ATT Q , this utility-weighted metric ATT W conveys more information on the economic performance of protection. As Table 1 illustrates, ATT W can be larger than, smaller than, or equal to ATT Q , depending on which sites are protected effectively and how utility gains vary across them.
Site-level protection values are often not directly observable. We develop a new approach to overcome this data limitation within an impact evaluation framework by inferring the values in an indirect way. If the government protection agency (the "regulator") makes rational protection decisions, then we can model their decision process by a RUM and estimate each site's propensity score for being protected. A higher propensity score implies a higher expected utility gain as viewed by the regulator. Below, we formalize this approach and show how to derive site-level utility weights based on the propensity score and use them to estimate ATT W.

Behaviors of the Land User and the Regulator
We use the Roy model to explain forest-related behavior first for a private land user and then for a regulator. Assume there is a collection of forest sites, with individual sites distinguished by i, and for simplicity a single land user who decides whether to deforest each site. Q i = 0 if deforestation occurs and Q i = 1 otherwise. Unless necessary for clarity, we suppress the site index i henceforth to simplify notation. X denotes a set of site characteristics known to both the land user and the regulator. In the absence of regulation, the land user's private utilities with and without deforestation are given respectively by Y L 0 and Y L 1 , U L 0 and U L 1 denote utility effects known only to the land user; both have expected values of zero. These effects could include the land user's idiosyncratic knowledge of market conditions for products extracted from mangroves or supplied by non-mangrove land uses, or non-market uses of mangroves unknown to the regulator.
In the presence of regulation, a rational deforestation decision implies where with D being the treatment status ( D = 1 if the site is protected, D = 0 otherwise) and Q 1 and Q 0 being deforestation outcomes with and without protection, respectively. The regulator (and econometricians) can observe Q but not Y L 0 − Y L 1 , the private utility gain if deforestation occurs. As written, Eq. (3) includes no penalty for noncompliance with the protection regulation. Absent such penalty, the land user has no incentive to modify their behavior compared to the unregulated market setting.
Turning to the behavior of the regular, we initially follow the mangrove management example in Arrow et al. (2000) by assuming the regulator aims to increase social welfare, which mirrors Mäler's (1974) assumption of an environmental management agency with knowledge of society's preferences. Recognizing the existence of a social benefit from retaining forest that the land user ignores, the regulator has an incentive to intervene and correct the market failure by protecting sites. The regulator is rational and protects a site if and only if the social benefit from retaining forest exceeds the expected private gain from deforestation (i.e., the expected opportunity cost of protection), The social benefit is on the left side of the inequality and has two components: S (X) , which is determined by observed site characteristics; and U S , which is the regulator's private knowledge about the social benefit and has an expected value of zero. The right side shows the expected private gain from deforestation, L 0 (X) − L 1 (X) , multiplied by a scalar k R that converts the expected private gain to the same units as the social benefit; note that it does not account for the unobserved (by the regulator and econometricians) component of private utility. We can rewrite this expression as where the second line uses more compact notation to relabel terms on the first line. We refer to this expression as the selection equation.
To induce compliance, regulation imposes a penalty, such as a fine or lost profits, that the land user must pay if they are caught deforesting ( Q = 0 ) a protected site ( D = 1 ). We assume the land user can form an unbiased estimate of these costs, C > 0 , by multiplying their subjective probability of being caught by a prediction of the penalty. Because C is estimated by the land user, it is not observed by the regulator (or econometricians). Its addition changes Eq. (3) to which we can rewrite as We group U and C on the right hand side of the inequalities to highlight that they pertain to the land user's private information.

Evaluating the Program's Net Economic Impact
We now shift the perspective from the regulator's ex ante protection decisions to ex post evaluation of the protection program. If the outcome variable is forest status Q, then the individual treatment effect can easily be defined based on Eq. (8), To interpret i , we categorize all treated sites into three groups: S1: sites that will not be deforested regardless of being treated or not, i.e., Q 1i = Q 0i = 1; S2: sites that will be deforested regardless of being treated or not, i.e., Q 1i = Q 0i = 0; S3: sites that will not be deforested only if they are treated, i.e., Q 1i = 1 and Q 0i = 0. 2 From Eq. (5), the regulator views the net economic gain from protecting site i as the difference between the site's social benefit, which is lost if the site is deforested, and the expected private gain from deforestation. Denote this difference by ΔY i . For sites in group , the land user voluntarily chooses not to deforest even in the absence of regulation, and so protection causes neither a gain in social benefits nor a loss in land user utility: For sites in group S2, the land user always deforests. Protection still does not cause a gain in social benefits, but now it causes a loss in land user utility due to the additional cost of deforestation, C, in the presence of regulation: As noted earlier, the regulator does not observe this cost, a point to which we will return. Finally, for sites in group S3, protection causes both a gain in social benefits and a loss in land user utility: The program's overall economic treatment effect is determined by summing the impacts across all sites in all three groups: We have used here the assumption that the expected value of U L i is zero. ΔY can be interpreted as the social welfare gain if the regulator is benign, but it lacks this welfare interpretation if the regulator is not benign. In the latter case, it measures the net economic gain according to the regulator's preferences, which could differ from society's preferences. We will offer some insights into this discrepancy for mangrove protection in Thailand by comparing the government's stated criteria for protection to the factors that our estimated selection equation indicates actually influenced protection decisions.

Empirical Identification of the Utility-Weighted ATT
Empirical identification of ΔY faces four challenges. The first is that C is private information known to the land user but not econometricians. The version of ΔY that can be estimated will consequently omit the first sum in Eq. (13), This expression represents an upper bound on the true value of ΔY . The difference between the estimated and true values of ΔY is smaller if k R C is smaller, which occurs if the regulator places lower weight on land user utility, enforcement is weaker, or the penalty is lower. It is also smaller if C is a fine whose revenue is returned to the public (a transfer payment) or funds another government program that generates an economic gain.
The second challenge is that ΔY has a range of ℝ + . This range prevents an ATT based on it from being compared to the ATT for avoided deforestation. The latter is the expectation of Eq. (9) conditional on treatment, which is just the share of S3 among the treated sites, ATT Q thus ranges from zero to one. We rescale ΔY to the same range by applying the following expression, The denominator represents the maximal utility gain that can possibly be achieved, while the numerator represents the actual gain. The "W" in ATT W indicates that this economic ATT weights treated sites differently from each other, unlike the physical ATT Q (see Table 1). The third challenge is that ΔY and its rescaled version ATT W include direct measures of utility, which econometricians do not observe. Empirical identification of ATT W requires identifying the conditional expectation of the expression in parentheses in it, which we label W i : Despite its inclusion of direct utilities, W i can be identified as follows. In the selection equation (Eq. (6)), we assume that R (X) can be modeled as a linear function of X, X , and that U R follows the standard logistic distribution. These assumptions enable us to identify by estimating the selection equation as a logistic regression model. With identified, we can identify W i by applying the expression , which we show in Appendix 1 is equivalent to W i is monotonically increasing in X i , which implies that, compared to ATT Q , ATT W gives more weight to treated sites associated with higher expected utility gains. It follows that W i is also monotonically increasing in the propensity score from the logistic model,

as substituting this expression into Eq. (19) yields
The final challenge is the familiar one with impact evaluations: the econometrician observes treated sites only in their treated state, not their counterfactual untreated state, and faces an associated selection problem because both treatment status and the treatment effects are functions of X. We address this challenge by using the propensity score to select a matched set of untreated observations as controls for the treated observations. Propensity score matching eliminates selection bias if: (i) selection is only on observables, (Q 1 , Q 0 ) ⟂ D|X (the Conditional Independence Assumption, CIA; Rosenbaum and Rubin 1983); and (ii) the characteristics of treated and untreated sites overlap sufficiently so that the treated sites and their matched controls are statistically indistinguishable, ℙ(D = 1|X = x) ∈ (0, 1) for all x in the support of X (the common support assumption; Ho et al. 2007). The CIA is satisfied if U S ⟂ U L , which requires including all variables that influence selection and outcomes in the estimated selection equation, and full support can be checked empirically by examining the balance of the treatment and control groups.
Empirical identification of ATT W thus involves first estimating the selection equation (Eq. (6)), then using Eq. (21) to estimate W i , and finally using propensity score matching to estimate where Q j is the matched counterpart of Q i . Compared to the estimated version of the ATT for avoided deforestation (Eq. (16)), which is the only difference is the inclusion of the estimated utility weights.
We close with three associated empirical points. First, Appendix 2 shows how the expressions for ̂A TT Q and ̂A TT W given by Eqs. (22) and (23) can be extended to cases in which Q is a continuous, not binary, variable. Second, Appendix 3 shows how to estimate the standard error of ̂A TT W . Third, Eqs (22) and (23) imply that the magnitude of ̂A TT W relative to ̂A TT Q depends on the correlation between the estimated individual treatment effects ( Q i − Q j ) and the estimated propensity scores ( p(X i ) ). ̂A TT W is larger (smaller) than ̂A TT Q if the correlation is positive (negative), and the two measures are identical if the correlation is zero. This relationship enables one to predict the potential range of ̂A TT W for a particular empirical case, by using the estimated individual treatment effects and varying the correlation from −1 to +1. One can therefore not only estimate ̂A TT W but also determine if it is closer to the minimal or maximal values it potentially could have had.

Policy Context
In the 1970s and 1980s, Thailand experienced a boom in shrimp farming, which entailed the complete or near-complete conversion of affected mangrove areas into shrimp ponds. Mean annual mangrove deforestation rose sharply, from 45 km 2 /yr during 1961-1979 to 130 km 2 /yr during 1979-1986(Charuppat and Charuppat 1997. By 1986, nearly half (47%) of the country's original mangrove area had been converted to other land uses, with shrimp farms accounting for a third of the cumulative historical loss (Charuppat and Charuppat 1997). In response to this rapid and large loss, the country's highest governmental authority, the Prime Minister's Cabinet, issued a resolution on December 15, 1987 that established a new national mangrove management system. This system assigned the country's mangroves to three management zones: (1) Protection Zone, which prohibited deforestation, defined as the conversion of mangroves to other land uses, and all resource extraction; (2) Economic Zone A, which prohibited deforestation but allowed sustainable harvesting of wood products by local communities and commercial companies; and (3) Economic Zone B, which allowed both deforestation and extractive uses. The government implemented this system for 13 years. A Cabinet resolution on August 22, 2000 marked its end, by prohibiting the renewal of harvesting contracts in Economic Zone A and any additional deforestation or resource extraction in Economic Zone B. In 2002, the government shifted responsibility for mangrove management from the Royal Forestry Department (RFD) to a new Department of Marine and Coastal Resources (DMCR). We evaluate Thailand's zoning system using both the conventional ATT for avoided deforestation and the utility-weighted ATT, with propensity score matching as the impact evaluation method and satellite data on deforestation during 1987-2000. Five features of the zoning system make it attractive for illustrating the application of the utility-weighted ATT. First, consistent with the description of the "regulator" in Sect. 2.2, the Cabinet can be viewed as a single agent that made a discrete protection intervention. This situation differs from more complex ones with protected areas established by multiple bodies (e.g., national, subnational) at multiple points in time. Second, the system had well-defined start and end dates. The delineation of the pre-treatment and treatment periods and the appropriate endpoint for measuring the system's outcomes are both clear, which is necessary for an impact evaluation. Third, the system included a zone-the Protection Zone-that prohibited deforestation and thus represents a suitable "treatment" for an impact evaluation with avoided deforestation as the physical outcome measure. Although Economic Zone A prohibited deforestation too, it is not suitable for evaluation because it allowed sustainable wood harvesting. Distinguishing harvesting from deforestation is difficult in satellite imagery before a forest regenerates from harvesting. Including Economic Zone A in the treatment group would risk misclassification of tree-cover loss and biased estimates of treatment effects.
Fourth, using the Protection Zone as the treatment requires information on the spatial locations of mangroves assigned to it, and such information exists because the 1987 Cabinet resolution approved and endorsed a specific map prepared by the RFD. This map delineated the boundaries of all three zones in the entire national area estimated as originally having been mangrove. Cabinet resolutions from 1982 and 1984 had called for the development of a zoning system (Mangroves for the Future 2011). The RFD's mapping effort stemmed from these resolutions and was completed before the 1987 resolution. The zone assignments remained unchanged during 1987-2000, which implies that using a single selection model-i.e., the first-stage logistic regression model in propensity score matching-to analyze the selection process is appropriate.
Finally, a suitable set of control locations exists because the RFD's 1987 map omitted some mangroves. We interviewed two senior officers in the DMCR who were members of the RFD team that prepared the map, 3 and they reported that these omissions were inadvertent. A likely reason is the relatively low resolution of the satellite data that the RFD used for mapping mangroves in the 1980s (1:250,000; Charuppat and Charuppat 1997). Lying entirely outside the zoning system, these omitted mangroves represent a purer control group than do mangroves in Economic Zone B, where deforestation was allowed but required the RFD's permission.
To our knowledge, the zoning system has never been evaluated using a causal-inference method. In fact, some literature mischaracterizes it as consisting of two zones, not three (Aksornkoae 2004). Case studies of small numbers of villages dominate the literature on mangrove management in Thailand (e.g., two villages, Sudtongkong and Webb 2008; four villages, Barbier 2008 andKongkeaw et al. 2019). They lack the representativeness necessary for evaluating the national performance of the zoning system, but they offer reasons to doubt the system's effectiveness at reducing deforestation. They highlight a gap between its progressive aspects, which on paper signaled a policy shift away from conversion toward conservation and gave local communities a greater role in managing mangroves than did corresponding policies for terrestrial forests (Kongkeaw et al. 2019), and its implementation, which by 1996 included communities in the management of barely 5% of the total zoned area (Charuppat and Charuppat 1997). They document disputes between communities and the government over ownership of mangroves, which created de facto open access (Sudtongkong and Webb 2008;Barbier 2008), and a lack of response by the government when communities reported illegal conversion or harvesting (Sudtongkong and Webb 2008). Consistent with this last point, the DMCR officers we interviewed reported that monitoring and enforcement of the system were weak due to inadequate staffing and budget and that penalties, if assessed at all, were typically low.
Despite these deficiencies, the system might not have been completely ineffective: (Charuppat and Charuppat 1997) reported that mean annual mangrove deforestation fell by nearly 80% during the decade after the 1987 Cabinet resolution compared to the decade before. There is thus merit in determining the system's effectiveness during its entire implementation period, in both biophysical and economically more relevant terms.

Data and Covariate Construction
We highlight key features of our data here, with full details on sources and processing reported in Appendix 4. We obtained a digital map showing mangrove presence/absence for Thailand in 2000 from Giri et al. (2005Giri et al. ( , 2011. We created a corresponding 1987 map using the same type of satellite data (Landsat), same techniques, and same resolution (30 meter) used to create the 2000 map. Using these maps, we drew a random sample of 50,000 points from locations where mangroves were present in at least one of the two years (see Fig. 1). We set the minimum distance between any two sampled points at 50 meters to prevent drawing multiple points from a single pixel. Given our focus on avoided deforestation, we dropped all points where mangroves were not present in 1987. 4 We classified the treatment status of the sampled points using a shapefile from the DMCR that showed the zone boundaries from RFD's 1987 map. The map delineated the zones at a fine scale (Fig. 2). The Protection Zone included 524 distinct polygons (noncontinguous areas), which were on average very small (mean and median areas = 0.28 km 2 and 0.026 km 2 , respectively). It accounted for only 4% of the total zoned area. Overlaying the 1987 mangrove cover map on the zone map revealed that the total area of the unzoned mangrove area omitted from the zone map was equivalent to 26% of the total zoned area. Overlays also revealed that the cumulative 1987-2000 deforestation rate was much lower in the Protection Zone (17%) than in the unzoned area (48%). Although the difference in these rates suggests that assignment to the Protection Zone might have cut deforestation by nearly two thirds, determining the actual impact requires controlling for selection bias through estimation of the ATT for avoided deforestation.
We retained points in the Protection Zone as the treatment group and points in the unzoned area as a pool of potential controls. The 1987 Cabinet resolution stated specific criteria for assigning mangroves to the Protection Zone. Table 2 lists these criteria, which reflect various environmental public goods associated with mangroves and thus imply a social welfare foundation for the criteria. We based our choice of covariates for the selection model on these criteria, the interview with the DMCR officers, and the matching literature on protected areas (PAs). The officers reported that the RFD relied most heavily on criteria 1.6 (existing PAs) and 1.10 (proximity to river banks and coastlines) in preparing the zone map. This reliance is understandable for two reasons. The first is access to spatial information: according to the DMCR officers, the RFD used official PA maps included in the Royal Gazette notices that established the PAs, and it used high-quality military topographical maps to measure distances. Second, these two criteria overlap with several others, with 1.6 subsuming 1.4, 1.5, 1.8, and 1.9 and 1.3 providing the coastal-protecton rationale for 1.10. We dropped points that were in PAs established before the Cabinet resolution (criterion 1.6), as they did not represent incremental treatment by the zoning system. These points accounted for 49% of the treatment group. We incorporated criterion 1.10 by constructing two distance variables, one for distance to the nearest river and the other for distance to the coast. 5 Regarding the remaining criteria, we incorporated 1.7 by constructing two dummy variables, which distinguished points with higher and lower average wind speeds, respectively, Areas vulnerable to erosion such as beaches, sand dunes, and mudflats 1.4 Areas of historical and archaeological importance 1.5 Areas with locally unique features 1.6 National parks, forest parks, tourism areas, wildlife sanctuaries, and non-hunting areas 1.7 Areas that should be preserved as wind breaks 1.8 Areas that should be preserved for research purposes 1.9 Areas that should be preserved for environmental ecosystem protection 1.10 Areas that are within 20 meters of natural water channels such as rivers and natural canals and are not more than 75 meters from the coast line from points with intermediate average wind speeds, and a dummy for points located outside the monsoon climate zone. 6 We incorporated 1.1 and 1.2 by constructing an index of the number of mangrove-resident animal species whose ranges encompassed each point. The 1987 Cabinet resolution did not state comparable criteria for Economic Zones A and B, but its authorization of these two zones implies that assignment of mangroves to the Protection Zone was also influenced by the opportunity cost of protection, with protection being less likely in locations where resource extraction and conversion to non-mangrove land uses were more valuable to local communities or industries. Consideration of the opportunity cost of protection is consistent with a net social welfare concept that accounts for more than just the public good benefits of protection reflected in the stated criteria in Table 2. It is also consistent with the PA matching literature, which provides ample evidence that various socioeconomic variables can affect either protection or avoided deforestation and thus are important to include in a selection model. We included a dummy variable to identify points on the Gulf of Thailand coast, where the shrimp farm industry was more heavily concentrated (Charuppat and Charuppat 1997). Although Thailand's original mangrove area was nearly evenly split between the two coasts (Charuppat and Charuppat 1997), our 1987 mangrove map revealed that the Gulf coast's share had fallen to barely a third (36%) by that year. This concentration implies a higher opportunity cost of protection on the Gulf coast and thus a lower likelihood of protection there. We also included distance variables of the types commonly used to measure proximity to features that affect the opportunity cost of protection (Pfaff et al. , 2015: distances to the province capital and district (amphoe) seat, distance to the nearest road, and a pair of dummy variables that distinguished points nearer to and farther from the forest edge from points at an intermediate distance from the edge. Following Miteva et al. (2015), we also included population density for the subdistrict (tambon) where a point was located and the fraction of subdistrict area that was in mangrove in 1987, 7 both of which can likewise be expected to affect the opportunity cost of protection. Most of these socioeconomic variables can be given a political interpretation in addition to a purely economic one (i.e., opportunity cost). For example, the Gulf dummy could proxy for the political influence of the shrimp farm industry. For this reason, statistical significance of these variables in the estimated selection model could reflect political influence over zoning decisions, which could cause the decisions to deviate from welfare-maximizing ones.
We did not include two variables commonly used in the PA evaluation literature, slope and elevation, because mangroves occur primarily in intertidal zones where variation in these factors is negligible. 8 The total number of covariates in our selection model, 14, Matching requires the common support assumption: covariates must have sufficient overlap between the treatment group and the untreated group. For each covariate, we identified untreated points with values more than 5% beyond the range of treated points and dropped them from the sample. Our final data set included 7,790 points, 671 of them treated. The pool of potential controls was thus large relative to the number of treatment points, which is advantageous for matching. Table 3 lists the covariates and their means and standard deviations for the treated and untreated points. Substantial differences can be observed for many of the variables, which highlights the need for matching to construct a balanced sample before estimating the treatment effects.

Evaluation Based on ATT for Avoided Deforestation
We used matching with replacement, so that each untreated point could be used as the best match ("nearest neighbor") for multiple treated points. We set the caliper-the maximum permissible difference between the propensity score for a treated point and its matched control-at 0.01. Application of the caliper reduced the number of treated points by a small amount, from 671 to 653 if we matched each treated point with a single control point and 646 if we matched it with two.
We evaluated covariate balance using the two most common diagnostics in the matching literature: the standardized mean difference between the treatment and control groups, and the ratio of variances for the two groups (Austin 2009). Formal statistical tests of these diagnostics are not valid in propensity score matching (Imai et al. 2008), so we instead applied guidelines recommended in the literature. We required the absolute value of the standardized mean difference to be below 0.1 (Normand et al. 2001;Austin 2009) and the variance ratio to be within the 2.5th and 97.5th percentiles of an F-distribution with degrees of freedom implied by the number of matched treatment-control pairs (Austin 2009). These percentiles were 0.86 and 1.17 in our sample. With the number of matched controls per treated point set initially at one, one covariate, Wind speed: high, was unbalanced according to the standardized mean difference (0.148) and barely balanced according to the variance ratio (1.165) (see Appendix 5). In response, we increased the number of matched controls to two, which eliminated the imbalance (Table 4). We used two matches in all subsequent analyses. The literature on PA evaluation emphasizes the need to address spillover effects, which can bias estimates of treatment effects (Robalino et al. 2017;Herrera et al. 2019). There are two types of spillovers, leakage and blockage (Fuller et al. 2019). In the case of leakage, PA establishment displaces deforestation that would have occurred inside a PA to an unprotected location where deforestation would not have occurred in the absence of protection. Conversely, in the case of blockage, PA establishment increases the cost of deforestation in unprotected locations and reduces the amount of deforestation that would have otherwise occurred in those locations. Failure to control for leakage and blockage can result in overestimation and underestimation, respectively, of PAs' effectiveness in reducing deforestation.
When investigating local spillovers, the definition of "local" depends on the geographical scale of the PA in question. Because mangroves naturally occur in narrow coastal strips, Thailand's zoning system delineated the polygons for the Protection Zone at a much finer scale (see Fig. 2) than, say, the scale of PAs commonly analyzed in countries such as Brazil (Herrera et al. 2019) and Costa Rica (Robalino et al. 2017). As noted earlier, the mean and median areas of Protection Zone polygons were substantially smaller than 1 km 2 . We therefore expect spillover effects to occur primarily within a few kilometers of the corresponding polygons. To investigate the magnitude of spillovers, we first estimated ATT Q , the treatment effect for avoided deforestation, using all 646 matched pairs as a baseline. Then, we progressively excluded untreated points that were within a 1 km, 2 km, 3 km, 4 km, or 5 km buffer of a Protection Zone polygon and, at each step, reestimated the selection model, formed a new set of controls in view of the reduced pool of potential controls, and reestimated ATT Q . Table 5 displays the results. Imposing the buffers only slightly affected the estimates of ATT Q , and the confidence intervals overlap for all the estimates. We therefore conclude that spillover effects were negligible and that the no-buffer case yields the preferred estimate of ATT Q. In addition to the lack of evidence of spillovers, we prefer the no-buffer case because it includes the largest number of treated points, which falls to only 437 for the 5 km buffer. It also has the best balance: the number of covariates that violate the balance conditions rises from zero for this case to 3, 2, 2, 6, and 7, respectively, for the 1 km, 2 km, 3 km, 4 km, and 5 km buffers. This deterioration in match quality is due to the shrinking pool of potential controls.
From Table 5, the estimated ATT Q for the no-buffer case has a highly significant (p < 0.01) value of 0.173. This estimate indicates that 17.3% of the Protection Zone prevented deforestation that would have occurred in the absence of protection. It is barely half of the 31-percentage point difference between the observed 1987-2000 deforestation rates for the Protection Zone and the unzoned area reported earlier, which underscores the need to control for selection bias when evaluating forest protection programs: due to selection bias, the observed difference in deforestation rates exaggerates the Protection Zone's impact on avoided deforestation. Nevertheless, the positive and significant value of the estimated ATT Q confirms that creation of the Protection Zone reduced mangrove deforestation during 1987-2000. 9

Evaluation Based on Utility-Weighted ATT
To implement our new evaluation approach based on ATT W , we estimated individual propensity scores for all treated points using the same logistic regression that served as the first stage of our matching estimator. We used the no-buffer sample given its superior balance and the lack of evidence of spillover effects. Table 6 presents the regression results. As discussed in Sect. 3.2, the first six covariates pertain to assignment criteria for the Protection Zone stated in the 1987 Cabinet resolution. Most of them are not significant at even p < 0.1. Wind speed: low is significant at that level and has the expected negative sign. Animal species is highly significant (p < 0.01) but is unexpectedly negatively signed: mangroves were less likely to be assigned to the Protection Zone if they had more species. In sum, we found little evidence that the stated criteria influenced treatment.
We found more evidence that socioeconomic covariates mattered. As expected, the Gulf dummy had a negative effect (p < 0.1). The probability of being treated was higher for interior mangroves and lower for mangroves near the forest edge (p < 0.01 for both) and lower for locations with higher population density (p < 0.05), which mirrors results in previous matching studies on PAs (e.g., Joppa and Pfaff 2009;Miteva et al. 2015). We cannot determine whether the significance of these variables reflects rational consideration of the opportunity costs of protection, which would be consistent with the general design of the zoning system, or instead reflects political influences that caused treatment to deviate from social welfare maximization. The lack of evidence that the stated criteria for the Protection Zone mattered inclines us to believe that treatment decisions were not purely apolitical, however.
Using these results, we followed the approach explained in Sect. 2.3 to construct the utility weights for all treated observations in the sample. As explained in Sect. 2.4, the potential range of ATT W for the Protection Zone. In effect, we can simulate a more complex, and more realistic, case than the simple example given in Table 1. Holding ATT Q constant at its estimated value and using the utility weights we constructed, we found that ATT W attains a maximum value of 0.749 if the two series are perfectly positively correlated and a minimum value of -0.514 if they are perfectly negatively correlated. If we rule out protection causing deforestation of mangroves that would not be deforested in the absence of protection (see Sect. 2.3), then the minimum value becomes zero. ATT W therefore has a potential range from zero to nearly four times ATT Q (0.749 vs. 0.173). This wide range, straddling ATT Q , demonstrates that the ATT for avoided deforestation can be a highly misleading proxy for the economic impact of a forest conservation program. Our estimate of the utility-weighted ATT turned out to be near the lower end of the potential range, 0.128. This estimate is approximately a quarter (26%) less than the estimated ATT for avoided deforestation and is highly significant (p < 0.01). Figure 3, which plots quantiles of the predicted individual treatment effects for points in the treatment group against the corresponding predicted propensity scores, illustrates that the estimate of ATT W is smaller than the estimate of ATT Q for the reason just given: i.e., the two series are negatively correlated. 10 The two measures have different interpretations: while the estimate of ATT Q indicates that approximately 17% of the Protection Zone prevented deforestation that would have occurred in the absence of protection, the estimate of ATT W indicates that the Protection Zone protected approximately 13% of the economic value, as perceived by the government, that would have been lost if all mangroves in the Protection Zone had been deforested. They are not alternative measures of the same quantity.
The difference in magnitude between the two measures suggests that interpreting ATT Q as a proxy for the Protection Zone's economic impact would overestimate the impact by roughly a third (35%). The difference is not significant, however (p = 0.16). The lack of significance is due primarily to the relatively large standard errors of ATT Q and ATT W , 0.024 Fig. 3 Distribution of the average treatment effect for avoided deforestation ( ATT Q ) by quantiles of the predicted propensity score (20 quantiles) and 0.041, respectively, which in turn has two causes. One cause that is common to both estimates is the relatively small number of treated points in the sample. Tripling the number, from 646 to 1,938, would cause the test statistic to become significant at p < 0.04. The other cause, which pertains to ATT W , is that unequal weights can yield less precise estimates than equal weights (see Appendix 3). The weights included in the estimate of ATT W are unequal, having a coefficient of variation of 1.46.

Conclusion
Propensity score matching is one of the most widely applied methods for identifying and estimating treatment effects models, especially in the forest conservation literature. In its standard application, the first stage of a propensity score matching study serves only as a statistical step for predicting an index, the propensity score, that is subsequently used to match treated observations to controls. We have argued that when the outcome variable is a binary variable that characterizes behavioral responses by agents to a government program, an economic interpretation is embedded in the first-stage analysis as it reflects the regulator's preferences in assigning treatment. We exploit this economic interpretation to develop a method, based on utility weights derived from the predicted propensity scores for treated observations, that indirectly evaluates the economic impact of a program whose observed outcomes are measured in physical, not economic, terms. In so doing, we connect the impact evaluation literature to the non-market valuation literature.
If the regulator's preferences align with society's, then the utility-weighted ATT generated by our method has a social welfare interpretation. Of course, such alignment does not necessarily occur in practice. For the case of mangrove protection in Thailand during 1987-2000 that we use to demonstrate the method's empirical application, we show how one can gain insight into the degree of alignment by first reviewing the stated criteria for treatment and then comparing those criteria to estimation results for the selection model. In the Thailand case, the stated criteria for protecting mangroves plausibly aligned with social welfare, as they accounted for both a range of environmental public goods supplied by mangroves and the opportunity costs of protection. On the other hand, selection into treatment was generally not significantly influenced by covariates representing environmental public goods; instead, it was mainly influenced by covariates that represented opportunity costs and possibly political influences. We would thus argue that the utility-weighted ATT in this case measures the economic gain from treatment valued according to government preferences that did not align perfectly with social preferences.
The utility-weighted ATT expands the toolkit that applied economists have for evaluating the impacts of government programs, forest conservation programs in particular. Although we have focused on the evaluation of programs whose observed physical outcomes are binary, we have shown how it can be extended to continuous outcomes (Appendix 2). Nonetheless, it has limitations. First, in the absence of information on enforcement actions against noncompliance with a program, it is best interpreted as an upper bound on economic impacts. This bias is probably small in the context of mangrove protection in Thailand during 1987-2000, as enforcement was reportedly weak and penalties low. Second, derivation of the utility weights assumes that treatment is assigned at a single point in time by a single government regulator. This assumption fits well the empirical context of the intervention we evaluate-the 1987 Cabinet resolution in Thailand-but we acknowledge that interventions in other contexts can be implemented at different times and by different regulators. Future research could work on relaxing this limitation by extending the method to multiple treatments, as has been done for example with difference-in-differences models. We note that mangrove management policy in Thailand continued to evolve after 2000 (Beresnev et al. 2016) and that our findings for protection during 1987-2000 should not be assumed to apply to the subsequent period. Future research could also work on relaxing a third limitation: the method models a situation where the utility impacts of treatment are spatially independent. The benefits of forest protection are instead often spatially interdependent, with protection of site i supplying greater benefits if site j is also protected. Examples include watershed conservation and wildlife habitat. Nonenvironmental treatments might similarly have spatially interdependent impacts.
The Thailand application reveals a fourth limitation: distinguishing estimates of the utility-weighted ATT and the ATT for avoided deforestation from each other is less feasible in smaller samples. Both estimates were highly significant individually, thus indicating that protection reduced deforestation and had a positive economic impact (as valued by the government). Moreover, the utility-weighted ATT was substantially smaller than the ATT for avoided deforestation. Yet, the two estimates were not significantly different. To estimate the difference between them more precisely, we would have needed a larger number of treated observations to overcome the large standard errors that can result from the variation in the utility weights.
We close by noting a difference in how a government might respond to a low estimate of the utility-weighted ATT compared to a low estimate of the physical ATT, when the government's objective is to maximize social welfare. The response to a low estimate of the physical ATT could include shifting program targets to alternative sites (or recipients), in order to increase a program's impact as measured in physical terms. The theoretical framework for the utility-weighted ATT does not imply a similar option to retarget a program, because it assumes that the government made the socially optimal choice of program targeting when it selected sites (or recipients) for treatment. If a program's impact as measured by the utility-weighted ATT is low, then increasing its impact is purely a matter of strengthening enforcement, not retargeting. Viewed from another angle, this difference implies that retargeting a program in response to a low estimate of the physical ATT might not be welfare-improving, particularly when the estimates of the physical ATT and the utility-weighted ATT are very different.

Appendix 1: Derivation of the Weight W i
For convenience, we simplify notation by denoting U S i by U i and X i by x. The cumulative distribution function (cdf) of U, evaluated at a specific value u, is given by the standard logistic distribution, with corresponding probability density function (pdf) is the expectation of the distribution x + U i truncated at x + U i > 0 . Since U i has an independent and identical distribution, we can further simplify notation by dropping the index i and defining Z(x) = x + U . The distribution of Z(x) is the distribution of U shifted to the left by x . Since U has a standard logistic distribution, we can write the pdf of Z(x) as The corresponding cdf for Z(x) is The probability that Z(x) > 0 is therefore Now, we can derive the expectation of the right tail of Z(x), truncated at Z(x) > 0: Denoting A = e x and v = 1 1+Ae −z , we have and therefore which corresponds to Eq. (19) in the text.

Appendix 2: Analytic Framework for ATT W with continuous Q
Here we consider the case in which the observed physical outcome Q is continuous, such as hectares of forest. This outcome has an unobservable, heterogeneous social welfare value, U i (Q) . Instead of estimating the ATT in physical terms, we are interested in estimating the ATT in welfare terms, We proceed by assuming U i (Q) has the following structure: A policy maker who acts on behalf of society does not observe but has an implicit or explicit model for predicting it by using a set of covariates Z, The first moment of the difference of these two expressions, , is ex ante expected welfare (as viewed by the policy maker) from treating site i when the physical outcome increases by one unit. Assuming treatment can only change Q in one direction (e.g., it can only increase, not decrease, forest area), the policy maker treats i ( D i = 1 ) if and only if i > 0 . We can further decompose i into two components, i = X i + i (Z i |X i ) , where X is a subset of Z that is correlated with the physical outcome Q and is observable by econometricians. This decomposition implies that the CIA is satisfied: D|X ⟂ (Q i,D i =1 , Q i,D i =0 ) . We can use propensity score matching to control for X, and we can estimate the weight, W i = [ i | i > 0] , using observable data on D and X. These steps allow us to rewrite which corresponds to the expression derived in the text for a binary Q.

ATT W
Estimating the standard error of matching estimators is challenging. It requires establishing an asymptotic approximation to the distribution of matching estimators, which is difficult because the estimators are non-smooth functions of the distribution of matching variables. Furthermore, Abadie and Imbens (2008) have shown that, for a fixed number of matches, bootstrapping is not valid for matching estimators. We therefore follow the alternative method for estimating standard errors of treatment effect estimators that Abadie and Imbens (2006) developed. First, we rewrite Eq. (22) as We treat ŵ i as given, i.e., ŵ i = w i , so the variance of ̂A TT W , conditional on the covariates and the treatment indicators, can be expressed as This procedure is similar to the way that Abadie and Imbens (2006) establish the variance of the conventional ATT matching estimator. We need to estimate 2 1 (X i ) for all i in the treatment group and 2 0 (X j ) for all j that are used as the matched controls. Fortunately, these estimates do not need to be consistent. Abadie and Imbens (2006) suggest estimating them using matching within the treatment group and the control group, respectively, and we implement their method.
In a more recent study, Abadie and Imbens (2016) show that the variance of matching estimators must be adjusted for the fact that propensity scores are estimated prior to matching. They derive the adjustment term, and they develop a procedure to estimate it. The adjustment term, denoted as 2 adj , is based entirely on the first-stage regression that estimated the propensity scores, which is identical to our first stage in estimating ̂A TT W . Therefore, we can directly add the adjustment term to the estimation of the variance of ̂A TT W given by Eq. (41): When ̂2 1 (X i ) and ̂2 0 (X j ) are homogeneous across i and j, the variance of ̂A TT W is always larger than the variance of the corresponding ̂A TT Q due to the presence of unequal weights. In that particular case, the economically more meaningful measure of a program's welfare impact that ̂A TT W provides compared to ̂A TT Q comes at the cost of reduced precision. In the general case, however, the variance of the individual treatment effect can be heterogeneous. Consequently, the variance of ̂A TT W can be larger, smaller, or equal to the variance of ̂A TT Q , depending on how weights are assigned.
The variance formula for ̂A TT W developed above has a limitation, as it treats ŵ i as given weights. Analyzing the distribution of a weighted average of random variables when the weights themselves are random is a challenging issue that remains on the frontier of statistical research (Tang and Yuan 2014;Roozegar and Soltani 2015). We do not attempt to solve that issue in this paper.

Mangrove Cover
We acquired GIS data on 2000 mangrove cover in Thailand from the Global Mangrove Forest Distribution, v1 (2000) dataset (Giri et al. 2005(Giri et al. , 2011. This dataset is in raster format, with a 30-meter pixel size. It codes mangrove cover as a binary variable with 1 indicating mangrove presence and 0 indicating absence. We created corresponding 30-meter resolution data on 1987 mangrove cover using the same source of satellite images (Landsat) and the same techniques used by Giri et al. (2011). We obtained Top-Of-Atmosphere (TOA) Landsat images for 1987 from the Google Earth Engine archive. If 1987 images were not available or were obscured by clouds, then we substituted images from 1986 or 1988. Such substitution represented less than 5% of the data used. We analyzed the images in the cloud computing environment of Google Earth Engine. We selected approximately 10,200 training points for three surface-cover classes: mangrove, non-mangrove, and water. We based selection of training points on several factors, including a shapefile for the boundaries of the zoning system (see below), very high-resolution satellite data (WorldView, GeoEye), and the spectral signature of the Landsat data (bands 1-5 and 7). We classified the Landsat data using a supervised classification approach with a random forest classifier. We performed the classification separately for 19 regions of interest (ROIs) that spanned the entire coastal region of Thailand. We classified the ROIs one at a time, with training images selected from within the same ROIs to improve classification accuracy. We edited the data in ERDAS IMAGINE, removing obvious errors such as false classification of mangroves in hilly areas. We converted the edited data so that it included the mangrove class only, and as a final step we mosaicked all the ROIs together to form a single, 1-bit dataset for the entire country. This dataset is available from the corresponding author.

Treatment Status
The Thailand Department of Marine and Coastal Resources (DMCR) provided a shapefile containing the boundaries of the Protection Zone, Economic Zone A, and Economic Zone B from the RFD map associated with the 1987 Cabinet resolution. We used GIS to overlay the boundaries onto our sample of points. We used a binary variable to code the treatment status, with observations in the Protection Zone coded as the treated group ( D = 1 ) and observations not in any of the zones coded as the untreated group ( D = 0).

Covariates
We used tools in ArcMap to generate the covariates in the selection model. Throughout the analysis, we used GCS_WGS_1984 for geographical coordinates and Azimuthal Equidistant as the projection. We used the latter projection to minimize distance measurement errors caused by map distortions. We set the latitude-longitude of the center of the Azimuthal Equidistant projection at 10 • N , 100 • E , which is approximately at the center of our study area.
Distance to river. We measured distance to the nearest river using a stream map of Thailand available on Marc Souris's research website. 11 We visually checked the map by overlaying it on the Google Earth base map, which revealed that it omitted many tidal streams near river mouths. Before taking the distance measurements, we manually edited the map to add tidal streams shown on the Google Earth base map.
Distance to coast. We evaluated two sources of data on Thailand's coastline: the shoreline map in the Global Self-consistant Hierarchical, High-resolution Geography (GSHHG) database (Wessel and Smith 1996); 12 and a digital map of administrative boundaries in Thailand that we obtained from the DMCR. We selected the latter because it matched much more closely with the mangrove zone boundary map. We visually checked its accuracy by overlaying it on the Google Earth base map. This comparison prompted us to manually edit one part of the administrative boundaries map (i.e., Thailand's international boundary with Myanmar along the Kra Buri River). We then used ArcMap to calculate the distance between each point in the sample and the nearest coastline. Because some mangroves can be submerged even during low tide, some points were classified as being beyond the coastline. We recoded distances for these points as 0 (i.e., directly on the coastline).
Wind speed. We drew data on wind speed from the Global Wind Atlas 2.0. 13 We extracted the 50-meter aboveground wind speed as a raster file from this database and overlaid it on our sample of points. To improve balance, we discretized the variable by converting it to a pair of dummy variables, one for points with wind speeds above the 75th percentile of the treated group (Wind speed: high) and the other for points with wind speeds below the 25th percentile (Wind speed: low). We also considered including a variable on cyclone frequency, but global data on cyclone frequency show little variation in Thailand (CHRR 2005, Dilley et al. 2005. Climatic zones. We generated categorical dummy variables for climatic zones. We manually coded a map of the World K öppen Climate Classification at the subdistrict (tambon) level. 14 Our study area includes only three climatic zones: equatorial, monsoon, and tropical savanna. We combined the dummies for the equatorial and tropical savanna zones into a single variable, as less than 4% of the treated points were in these two zones. The reference (excluded) zone in the selection model was thus the monsoon zone.
Mangrove-resident animal species. We considered several data sources on the distribution of mangrove-resident animal species in Thailand. We focused on amphibians, birds, 11 http:// www. savgis. org/ donne es-sig-2/ thail and. html. 12 https:// www. soest. hawaii. edu/ pwess el/ gshhg/. 13 The Global Wind Atlas is a free, web-based application developed, owned, and operated by the Technical University of Denmark (DTU). It is released in partnership with the World Bank Group, utilizing data provided by Vortex, using funding provided by the Energy Sector Management Assistance Program (ESMAP). For additional information: https:// www. globa lwind atlas. info/. 14 http:// commo ns. wikim edia. org/ wiki/ File: Thail and_ map_ of_K% C3% B6ppen_ clima te_ class ifica tion. svg. and mammals, whose geographical distributions are better known than for other taxonomic groups. One source was the International Union for Conservation of Nature (IUCN), whose Red List website provides comprehensive data on threatened species throughout the world. An obstacle to using these data was that we wanted data on the geographical distributions of species that were threatened in 1987 (or a year close to it), but IUCN reports current threat status. We therefore instead used data on the total number of species, not the number of threatened species. We acquired raster data on geographical distributions of amphibian species from IUCN and CIESEN (International Union for Conservation of Nature 2015a) and mammal species from IUCN and CIESEN (International Union for Conservation of Nature 2015b). We acquired corresponding data on bird species from BirdLife International and Handbook of the Birds of the World (BirdLife International and Handbook of the Birds of the World. 2018). We determined the number of species in each of the three taxonomic groups at each point in our sample by intersecting the species distributions with the coordinates of the points. We used the total historical range for each species, which refers to the area where the species is known or thought very likely to occur presently or historically. The covariate in the selection model is the first principal component of the variables for the three taxonomic groups. The first principal component contains 87% of the variation in the three variables.
Coasts. Mangroves occur on both of Thailand's coasts (Andaman Sea, Gulf of Thailand). Using the administrative boundaries map from the DMCR, we determined the province where each point in our sample was located and then classified the provinces by coast. No province is on both coasts. We included the dummy for the Gulf of Thailand coast in the selection model.
Distances to cities and roads. We used ArcMap to generate several other distance variables that are commonly included as covariates in the matching literature on forest conservation programs. These variables are proxies for proximity to centers of economic or political influence, which might affect treatment status, outcomes, or both. The first of these variables was distance to the provincial capital. We compiled from Wikipedia and verified (against Thai sources) a list of the capitals of coastal provinces in Thailand and their geographical coordinates. Having determined the province where each point in our sample was located, we calculated the distance from each point to the corresponding provincial capital. The second variable was distance to the district seat. Districts (amphoe) are one level below provinces in Thailand's administrative structure. We constructed this variable in the same way as distance to the provincial capital. The last distance variable was distance to the nearest road. The Digital Chart of the World includes a global road map for 1992. 15 We used this map to measure the distance between each point in our sample and the nearest road in Thailand as of that year. The data on distances to nearest urban area and nearest road are post-1987, which could cause them to be endogenous. The risk of endogeneity is small, however, because these variables are slow-moving and the number of years post-1987 is small.
Distance to mangrove edge. We used GIS tools and the 1987 mangrove layer to compute the distance between each point in our sample and the nearest terrestrial non-mangrove point. To improve balance, we discretized the variable by converting it to a pair of dummy variables, one for points with distances above the 75th percentile of the treated group (Distance to edge: far) and the other for points with distances below the 25th percentile (Distance to edge: near).
Mangrove area share. We used our 1987 mangrove cover dataset, the administrative boundaries map from the DMCR, and GIS tools to determine the total area of mangrove in the subdistrict where each point was located. We similarly determined the total area of each subdistrict, and we generated the mangrove area share variable as the ratio of the two areas.
Population density. Thailand's National Statistical Office (NSO) conducts Population and Housing Censuses every decade, with the 1980 and 1990 censuses straddling the 1987 Cabinet Resolution. Published census documents for these years report data only at a highly aggregated level (i.e., province). The NSO informed us that it could not provide data at the disaggregated level we preferred (i.e., subdistrict) for either year. It advised us to request data instead from the Department of Local Administration, which has generated annual population estimates for subdistrict since 1993. The Department provided us the 1993 data, which we used to generate a population density variable for the subdistrict where each point was located. Given that 1993 is post-treatment, this variable could be endogenous. The risk of endogeneity is small, however, because the Protection Zone accounted for only 1.6% of the total area of the subdistricts where this zone was located. This tiny percentage makes it highly unlikely that the 1987 classification of mangroves into the Protection Zone significantly influenced subsequent human population changes. To improve balance, we winsorized the bottom and top tenths of the distribution of the population density variable (Barnett and Lewis 1994).

Appendix 5: Covariate Balance when Match Number is 1
Here we present the covariate balance table with and without matching when we match each treated point with a single untreated point.