Introduction

Freshwater and estuarine ecosystems are intimately connected to their catchments (Hynes 1975; Johnson et al. 1997). Hydrologic behavior (Rinaldo et al. 1995), biophysical processes, and ecological functions (Black 1997; Johnson et al. 1997; Benda et al. 2004) are all related to the geomorphology of the collection basin. Given these connections, land use within catchments can have strong impacts on aquatic ecosystems.

Many studies have estimated land use impacts by relating aquatic responses to catchment land use proportions (Osborne and Wiley 1988; Allan et al. 1997; Herlihy et al. 1998; Liu et al. 2000; King et al. 2005, 2007; Goetz and Fiske 2008; Jordan et al. 2018), often by applying a linear regression across a sample of catchments:

$$C={\upbeta }_{0}+{\upbeta }_{1}F$$
(1)

where C is an aquatic response, such as the concentration (mass per unit volume) of a material in the receiving water, F is the fraction (proportion) of a catchment occupied by the land use that is the source of the material, and \({\upbeta }_{0}\) and \({\upbeta }_{1}\) are fitted regression coefficients. Both regression coefficients have units of the response (mass/volume for a concentration response), and the land proportion F is dimensionless. For simplicity, we focus on the case where the effect originates from a single land use, here called the source land.

For a concentration response, the regression parameter \({\upbeta }_{1}\) can be interpreted by defining the mass M of material released from source land in a catchment as \(M=mS\), where S is the area of source land and m is mass release per unit area. The entire catchment delivers a volume of water V to the receiving water as described by \(V=vA\), where A is the total area of the catchment and v is the water yield (volume per unit area). Then,

$$C=\frac{M}{V}=\frac{mS}{vA}=q\frac{S}{A}=qF$$
(2)

where \(q=m/v\) is the concentration loading rate (in mass volume−1 area−1), and F is the proportion of source land relative to total catchment area. Regression parameter \({\upbeta }_{1}\) in Eq. (1) provides a statistical estimate of the conceptual loading rate q in Eq. (2).

The regression approach has been modified to evaluate how aquatic responses to land use are moderated by spatial factors, such as transport distance or the arrangement of source land relative to sink ecosystems like riparian buffers, floodplains, or stream corridors (Bohlke and Denver 1995; Lowrance et al. 1997; O'Neill et al. 1997; Omernik et al. 1981; Sponseller et al. 2001; Weller et al. 1998, 2011; Weller and Baker 2014).

To account for transport distance, many studies have replaced land use or impervious surface proportions with “effective” proportions derived from weighting by distance (Comeleo et al. 1996; Van Sickle and Johnson 2008; Walsh and Kunapo 2009; Peterson et al. 2011; Peterson and Pearse 2017; Walsh and Webb 2014). Some studies implemented an attenuation function within a geographic information system (GIS) calculation (Hunsaker and Levine 1995; Soranno et al. 1996; Zhang 2010). Other studies separated spatial analysis from weighting by dividing catchments into distance increments, tabulating total and source areas within each band, and fitting different weighting models using that tabular summary (King et al. 2004, 2005; Zhang 2010). With this enhancement, GIS calculations need not be repeated for each weighting considered, but using just a few bands can sacrifice distance resolution and reduce the accuracy of the analysis.

Inverse power distance weighting (1/d, where d is distance) is the most widely used weighting function (Comeleo et al. 1996; Goetz and Fiske 2008; Peterson et al. 2011), but step, exponential and linear weighting functions have also been used (e.g., Van Sickle and Johnson 2008; Walsh and Webb 2014). Far more studies have implicitly used distance weighting by only considering land use within a fixed distance from a response point (Osborne and Wiley 1988; Johnson et al. 1997; Tufford et al. 1998; Jones et al. 2001; Sponseller et al. 2001; Snyder et al. 2003), often implemented with the “buffer” function of a GIS package.

Most studies have used a weighting calculation that applies distance weights to total area as well to as source area (King et al. 2004, 2005; Van Sickle and Johnson 2008; Peterson et al. 2011; Sheldon et al. 2012; Walsh and Webb 2014), but some have left total area unweighted (e.g., Comeleo et al. 1996; Walsh and Kunapo 2009). The implications of how total area is weighted have not been explored.

Weighting function parameters have not been estimated with statistical rigor. King et al. (2004) tested three inverse power functions and chose one that yielded the greatest improvement in model R2. Others sought optimum weighting function parameters by trying a set of values, creating contours of model skill versus parameter value, and visually identifying the value maximizing model skill (Hunsaker and Levine 1995; Soranno et al. 1996; Van Sickle and Johnson 2008; Zhang 2010; Walsh and Webb 2014). No studies have optimized the weighting function parameters simultaneously with the slope and intercept in land use regression.

Past studies have not effectively quantified how weighting changes inferences about where land use impacts originate within catchments. Most interpretations have been based on plots of the weighting function versus distance (King et al. 2004, 2005) or on the “half distance” at which an effect is reduced by 50% (Van Sickle and Johnson 2008; Walsh and Webb 2014). These approaches do not account for the effects of weighting total land as well as source land, nor do they consider how weighting interacts with the distance distributions of source and total land.

In this paper, we clarify and extend the conceptual foundation of distance-weighted land use regression, enhance the spatial and statistical methods, and provide new tools for understanding spatial patterns. We apply these new methods to a published analysis to illustrate the improved understanding they provide.

Methods

Conceptual foundations of distance weighting

Derivation of weighted land use regression

Distance-weighted regression is very general and can be applied to many problems where an effect propagates from sources to impact points. To simplify the derivation, we focus on a material of interest released from a single source land use type and transported hydrologically to an aquatic measurement point. The derivation accounts for non-conservative transport and dilution while assuming that the system is at steady state so that transport time need not be considered explicitly. We provide a list of mathematical symbols (Table S1) to help interpret the equations.

To account for arrangement effects on an aquatic response C, we divide catchments into bands, each defined by its distance d from the receiving water. The number of bands is B while \({s}_{b}\) and \({a}_{b}\) are the areas of source and total land in band b. Summing across the bands recovers the catchment total source area S and catchment area A.

$$C=\frac{\sum_{b=1}^{B}m{s}_{\mathrm{b}}}{\sum_{b=1}^{B}{va}_{\mathrm{b}}}=q\frac{\sum_{b=1}^{B}{s}_{b}}{\sum_{b=1}^{B}{a}_{\mathrm{b}}}=q\frac{S}{A}=qF$$
(3)

Distance weighting hypothesizes that source land further from the receiving water delivers less material than closer source land. We implement this by multiplying the source area sb in the numerator by a distance weighting function \(w\left(d\right)\) that declines with distance d from the receiving water. Distant land may also deliver less water than closer land, so we multiply the denominator by another function that declines with distance \(x\left(d\right)\). The expression for concentration in the receiving water becomes

$$C=\frac{\sum_{b=1}^{B}m{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{va}_{b}x({d}_{b})}=q\frac{\sum_{b=1}^{B}{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{a}_{b}x({d}_{b})}=qG$$
(4)

The quantity G is the “distance-weighted source proportion” or “effective source proportion”—the proportion of catchment area occupied by source land after adjusting for the effects of distance on the delivery of material and water to the receiving water. We refer to the use of different numerator and denominator weighting functions [w(d) and x(d)] as “separate weighting.” Conceptually, the numerator accounts for attenuating, non-conservative transmission of the material or effect from source land to the receiving water, while the denominator accounts for non-conservative transmission of water or other diluting effects from all the land.

Incorporating Eq. (4) into regression Eq. (1) yields the weighted of concentration or aquatic response to land use

$$C={\beta }_{0}+{\beta }_{1}\frac{\sum_{b=1}^{B}{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{a}_{b}x({d}_{b})}={\beta }_{0}+{\beta }_{1}G$$
(5)

where the regression coefficient β1 provides a statistical estimate of the conceptual loading rate q.

Two published weighting approaches are special cases of Eq. (5). If water delivery attenuates with distance the same way as material delivery, then the denominator weighting function is the same as the numerator weighting function [x(d) = w(d)] and Eq. (5) simplifies to

$$C={\beta }_{0}+{\beta }_{1}\frac{\sum_{b=1}^{B}{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{a}_{b}w({d}_{b})}={\beta }_{0}+{\beta }_{1}G$$
(6)

This represents the “double weighting” method that has been most widely applied (King et al. 2004, 2005; Van Sickle and Johnson 2008; Peterson et al. 2011; Sheldon et al. 2012; Walsh and Webb 2014). If the denominator weighting function x(d) has a constant value of 1 rather than declining with distance, then Eq. (5) simplifies to

$$C={\beta }_{0}+{\beta }_{1}\frac{\sum_{b=1}^{B}{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{a}_{b}}={\beta }_{0}+{\beta }_{1}\frac{\sum_{b=1}^{B}{s}_{b}w({d}_{b})}{A}$$
(7)

This equation incorporates the weighting method applied by Comeleo et al. (1996) and Walsh and Kunapo (2009), which implicitly assumes that water delivery or other diluting effects do not attenuate with distance (“single weighting”). Finally, if w(d) also equals 1, then Eq. 7 reduces to the unweighted relationship (Eq. 2) and G = F.

No assumption about the attenuation of water with distance can be universally correct. One can avoid both assumptions by statistically fitting separate functions for the numerator and denominator (Eq. 4), allowing the data to indicate how much the diluent is attenuated.

Weighting functions

Four weighting functions have been applied (King et al. 2004, 2005; Van Sickle and Johnson 2008; Walsh and Kunapo 2009; Walsh and Webb 2014):

Step function with threshold \({t}_{\mathrm{s}}\)

$$w\left(d\right)=\left\{\begin{array}{c}1,\quad d<{t}_{\mathrm{s}}\\ 0,\quad d\ge {t}_{\mathrm{s}}\end{array}\right.$$
(8)

Linear function with threshold \({t}_{\mathrm{l}}\)

$$w\left(d\right)=\left\{\begin{array}{c}1-d/{t}_{\mathrm{l}},\quad d<{t}_{\mathrm{l}}\\ 0,\quad d\ge {t}_{\mathrm{l}}\end{array}\right.$$
(9)

Exponential function with constant \(k\)

$$w\left(d\right)={e}^{-kd}$$
(10)

Power function with power \(p\)

$$w\left(d\right)=\frac{1}{{d}^{p}}={d}^{-p}$$
(11)

Each of these has a single fitted parameter (ts, tl, k, or p). We also tested a generalized threshold function that can represent concave, convex, linear, and step functions (see Supplemental Information).

An ideal proximity weighting function has three characteristics: declines with distance d, approaching 0 as d gets large; is 1 (no distance effect) at d = 0, and yields the same model results regardless of the units of distance measurement (scale invariance). The last condition can be met if changing units does not change the weighted source proportions G or if changing units multiplies the weighted proportion of every catchment by a constant γ so that regression with the rescaled data yields the same intercept and fit statistics as the unscaled data, but with rescaled slope \({\upbeta }_{1}/\upgamma\). The step, linear, and exponential functions meet all three criteria. The inverse distance function \({d}^{-p}\) meets two criteria but is discontinuous at d = 0. A modified version, (d + 1)p (Van Sickle and Johnson 2008; Peterson et al. 2011; Sheldon et al. 2012; Peterson and Pearse 2017) is 1 at d = 0 but can yield different results in statistical models if the distance measurement units are changed. We also tested two other modified power functions that do meet all three criteria (see Supporting Information).

Spatial distributions of weighted source land and delivered materials

The proportion of source land in distance band b expressed as a fraction of the total catchment area is

$${f}_{b}=\frac{{s}_{b}}{A}$$
(12)

Without distance weighting, the contribution of band b to material concentration at the response point is

$${c}_{b}=q{f}_{b}=q\frac{{s}_{b}}{A}$$
(13)

The sum across bands \(\sum_{b=1}^{B}{f}_{b}\) is the overall fraction of source land in the catchment (F) and the sum \(\sum_{b=1}^{B}{c}_{b}\) is the concentration in the receiving water (C).

The proportion of distance-weighted source land gb in band b as a fraction of distance-weighted total catchment area is

$${g}_{b}=\frac{{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{a}_{b}x({d}_{b})}$$
(14)

And the weighted contribution hb of the band to delivered concentration H in the receiving water is

$${h}_{b}={\beta }_{1}{g}_{b}={\beta }_{1}\frac{{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{a}_{b}x({d}_{b})}$$
(15)

The sum across bands \(\sum_{b=1}^{B}{g}_{b}\) is the weighted catchment source proportion (G) and the sum \(\sum_{b=1}^{B}{h}_{b}\) (plus β0) is the weighted concentration delivered to the receiving water (H). These equations are conceptually related to the “width function” describing the distribution of basin area with distance from the outlet (Rinaldo et al. 1995). In the regression model, β1 provides a statistical estimate of the loading rate q so hb = β1gb can be used to interpret the spatial distribution of material source for the regression model.

When applied across the distance bands of a catchment, Eq. (14) captures the combined effect of the spatial distributions of source and total land, weighting of source land, and weighting of total land on the spatial distribution of effective source land. Equation (15) provides the same integration to capture the combined effects on the spatial distribution of material delivery.

The relationships in Eqs. (14) and (15) help visualize and summarize the spatial distributions of source land and delivered materials in a catchment as well as to quantify how those distributions shift in a weighted analysis compared to an unweighted one. Plotting weighted source land proportion minus actual source land proportion against distance reveals where weighting increases or decreases effective source proportion relative to actual source proportion. Similarly, plotting hb − cb reveals how weighting shifts the distances from which materials are delivered to the receiving water.

Weighted source land proportion minus actual source land proportion is

$$g_{b} - f_{b} = \frac{{s_{b} w\left( {d_{b} } \right)}}{{\sum\limits_{{b = 1}}^{B} {a_{b} } x\left( {d_{b} } \right)}} - \frac{{s_{b} }}{A} = s_{b} \left[ {\frac{{w\left( {d_{b} } \right)}}{{\sum\limits_{{b = 1}}^{B} {a_{b} } x\left( {d_{b} } \right)}} - \frac{1}{A}} \right]$$
(16)

At high distances, \(w\left({d}_{b}\right)\) approaches 0, so \({g}_{b}-{f}_{b}\) approaches a negative value (\(-{s}_{b}/A)\). At the distance where \(w\left({d}_{b}\right)\)= \(\left[\sum_{b=1}^{B}{a}_{b}x\left({d}_{b}\right)\right]/A\), the term in brackets is 0 so that \({g}_{b}={f}_{b}\) and their difference is zero. Below that distance, the weighted source proportion \({g}_{b}\) is greater than unweighted proportion \({f}_{b}\), while above that distance the weighted proportion \({g}_{b}\) is less than unweighted source proportion \({f}_{b}\).

If the total land area in the denominator is not weighted (single weighting), then Eq. (16) simplifies to

$${g}_{b}-{f}_{b}=\frac{{s}_{b}w\left({d}_{b}\right)}{A}-\frac{{s}_{b}}{A}={s}_{b}\left[\frac{w\left({d}_{b}\right)-1}{A}\right]$$
(17)

Because \(w\left({d}_{b}\right)\) is a fraction, the term in brackets is always negative, so that with single weighting, the weighted proportion \({g}_{b}\) is less than the unweighted \({f}_{b}\) at all distances.

The same patterns apply to the difference in material delivery between weighted and unweighted models because material delivery is simply the constant q times weighted proportion (Eq. 16).

$$h_{b} - c_{b} = qs_{b} \left[ {\frac{{w\left( {d_{b} } \right)}}{{\sum\limits_{{b = 1}}^{B} {a_{b} } x\left( {d_{b} } \right)}} - \frac{1}{A}} \right] = \beta _{1} s_{b} \left[ {\frac{{w\left( {d_{b} } \right)}}{{\sum\limits_{{b = 1}}^{B} {a_{b} } x\left( {d_{b} } \right)}} - \frac{1}{A}} \right]$$
(18)

Realized weighting function

We define the “Realized weighting\({y}_{b}\) for a band b as the ratio of weighted source proportion \({g}_{b}\) for that band (Eq. 14) to the unweighted source proportion fb for that band (Eq. 13)

$${y}_{b}=\frac{{g}_{b}}{{f}_{b}}=\frac{\frac{{s}_{b}w({d}_{b})}{\sum_{b=1}^{B}{a}_{b}x({d}_{b})}}{\frac{{s}_{b}}{A}}= w\left({d}_{b}\right)\frac{A}{\sum_{b=1}^{B}{a}_{b}x\left({d}_{b}\right)}= w\left({d}_{b}\right)J$$
(19)

The realized weighting function yb is simply a constant (J) times the distance weighting function \(w({d}_{b})\), but that constant differs among catchments. The term \(A/ \sum_{b=1}^{B}{a}_{b}x({d}_{b})\) is the ratio of unweighted to weighted total catchment area. For each catchment, this ratio takes a unique value (J) that depends on weighting function x(db), catchment area A, and the distance distribution of total area within that catchment (the width function, Rinaldo et al. 1995). With an ideal weighting function, weighted total area is less than total area, so that the term is greater than one and amplifies the effect of the weighting function. The function yb takes the value 1 at the distance where \(w\left({d}_{b}\right)=1/J\). This point is helpful for interpreting the behavior of yb. At smaller distances, realized weighting yb is > 1, while yb < 1 at distances beyond this point.

New tools for implementing weighted regression

Geographic methods for quantifying spatial distributions

We quantified the distance distributions of source and total land for catchments using a GIS script that integrates digital maps of land use or land cover, water bodies, and elevation to calculate the distance from each cell of the raster land use or land cover map to the receiving water body. The script can calculate Cartesian distance or distances along downhill flow paths. Raw distance values were divided by 10 and converted to integers to produce compact tabular summaries, which were exported to files with columns for catchment identifier, distance increment, total land, and source land. This method delivers distance distributions resolved to the precision of the land cover raster instead of lumping distances into a few bands defined by arbitrary, a priori ranges (as in King et al. 2004, 2005; Zhang 2011).

Improved regression methods

We implemented non-linear least squares regression for estimating weighting function parameters simultaneously with the regression slope and intercept (Eq. 1). We used the R optimx package (Nash and Varadhan 2011), which includes algorithms that work well with discontinuous functions (e.g., Eqs. 8 or 9). optimx requires a user-supplied function to calculate the model residual sum of squares (RSS) for any values of the weighting and regression parameters. We wrote an R function that uses the tabular summaries of distances from the spatial analysis. It applies the weighting equation with the parameters supplied by optimx to all distance values, source areas, and total areas for each catchment. Those weighted areas are summed for each catchment to produce a weighted source proportion G for each catchment. Then, the values of regression parameters β0 and β1 provided by optimx are applied to each weighted source proportion G to calculate a predicted response. Finally, observed values are subtracted from the predictions, squared, and summed for return to optimx.

We performed model comparisons based on information theory to contrast model performance for different weighting functions and to test hypotheses about weighting (Akiake Information Criterion, AICc, Burnham and Anderson 2002). We also calculated model probabilities (Akaike model weights) as a weight of evidence for the best model within a model set (Burnham and Anderson 2002).

Case study

Test data

To test our methods, we reanalyzed data from a published study of how the distance from developed land to the shoreline affected the concentration of polychlorinated biphenyls (PCBs) in the tissues of fish caught in 14 subestuaries of Chesapeake Bay (King et al. 2004). In that study, the distance between each land cover cell and the shoreline was calculated. Cartesian distance was used even though downhill flow path distance provides a better description of hydrologic transport in most landscapes (King et al. 2005; Baker et al. 2006, 2007; Weller et al. 2011; Weller and Baker 2014), because flow path algorithms did not perform effectively for the very flat catchments of the subestuaries. For four possible measures of developed land cover (total developed, high intensity residential plus commercial, commercial, and impervious surface) and for all land, the counts of pixels were aggregated into seven distance classes. Subestuary average PCB concentration in fish was related to developed land covers using inverse distance-weighted regression. Percent commercial land, weighted by d−1, was the best predictor of PCBs in fish, and weighting increased R2 to 99% compared to 55% for unweighted commercial land. The study demonstrated a strong spatial effect but left opportunities to improve the analysis and refine its interpretation. King et al. (2004) provide more details and results.

Catchment characteristics

We used the watershed boundaries and land cover data presented by King et al. (2004) but applied our new spatial analysis (above) to obtain more precise tabulations of the spatial distributions of the Cartesian distances from commercial land and all land to the subestuary. To visualize the distributions, we plotted histograms of commercial land area and total land area versus distance to the subestuary using 200 m distance bins.

Model comparisons

We fit 15 weighted models to test hypotheses about distance weighting and to identify the most effective weighting functions. These 15 models reflect three ways of weighting the total area in the weighting equation (single, double, and separate) and four distance weighting functions (step, linear, exponential, and power) plus the unweighted model. For the power function, we used (d + 1)−p instead of d−p. We quantified model performance with four metrics (RMSE, R2, AICc, and Akaike model weight). We compared models within three subsets that were selected to answer three questions:

  • Is there evidence for an arrangement effect?

  • How should total land be weighted?

  • What is the best weighting function?

How does weighting affect apparent land use proportions?

For each weighting function, we quantified how weighting changed the catchment proportion of weighted source land (G) relative to the unweighted proportion (F) as well as weighted material concentration (H) relative to the unweighted concentration (C). We also plotted predicted material concentration versus weighted source proportion (G) for the 14 subestuaries along with the fitted regression line to show how weighting affected the strength of association and the magnitude of residuals around the fitted line.

Where in the system do materials originate?

For models with four weighting functions as well as the unweighted model, we visualized the spatial distributions of how weighting shifts source land proportions (gb − fb, Eq. 16) and contribution to material concentration (hb − fb, Eq. 18) using histograms with 200 m distance bins. We examined aggregate distance distributions for all the catchments together to focus on the differences among models, and we examined them for individual catchments to reveal how weighting functions interact with differences in catchment land use patterns.

Evaluation of extreme observations

Like King et al. (2004), we repeated the analysis after omitting two systems with distinctly high PCB to verify that key results do not depend on extreme values (see Supporting Information).

Results of the case study

Distance distributions of source land and total land

The catchment areas of the 14 subestuaries ranged from 46 to 662 km2 and summed to 2720 km2. The proportion of commercial land ranged from 0.25 to 13.9% among the catchments and was 4.33% of the summed study area. GIS analysis of distances to shoreline yielded 4715 unique increments ranging from 30 to 47,180 m as well as the areas of commercial and total land at each distance for each catchment. The distributions of total and commercial land areas versus distance from the subestuary for two example catchments illustrate differences among catchments in size, shape, and the spatial distributions of source and total area (Fig. 1). Watershed 32 was larger than catchment 29, and the distribution of catchment area declined much less strongly with distance in watershed 32. Commercial land in watershed 32 was concentrated very close to the subestuary, while the peak in commercial land area for watershed 29 occurred at some distance from the subestuary.

Fig. 1
figure 1

Distributions of watershed area (gray) and commercial land area (black) versus distance from the subestuary for the watersheds of two numbered subestuaries and for the aggregate of all study watersheds

Model comparisons

AICc-based model comparisons within subsets of models addressed the specific hypotheses or questions represented in the headings and tables presented below.

Is there an arrangement effect?

There was very strong evidence that spatial arrangement is important for the test data. Among double-weighted models with four different weighting functions, even the poorest lowered AICc by 43.0 compared to the unweighted model (Table 1), translating to a model probability of 1.0 for the weighted model compared to 0.0 for the unweighted. Any of the models was overwhelmingly more likely than the unweighted model, providing rigorous, probabilistic evidence that spatial arrangement is important and should be included in relating land use to the observed PCB concentrations in fish.

Table 1 Performance metrics and model parameters for the unweighted model and for double-weighted models of the test data with five different weighting functions

How should total land be weighted?

For the test data, there was strong support for double weighting (Eq. 6) and virtually no support for single weighting (Eq. 7). With double weighting, any of the weighting functions (Eqs. 811) provided a far better explanation of the data than the unweighted model (Table 2) improving R2 from 51% to at least 98% and lowering AICc by at least 43.0. None of the weighting functions applied with single weighting provided any increase in R2 and all had higher AICc values than the unweighted model (see also Supporting Information).

Table 2 Comparison of models of the test data using single or double distance weighting

The test data did not support separate weighting parameters for the numerator and denominator of the weighting function. Across all weighting functions, fitting separate weighting functions (which added a parameter to any weighted regression model) increased R2 at most by 0.0032 and raised rather than lowered AICc, which increased between 1.2 and 5 above the unweighted model (Table 3). Also, fitted parameters of the separate weighting functions in the numerator and denominator were similar.

Table 3 Comparison of models for the test data using double or separate distance weighting

What is the best weighting function?

With double weighting, the step function provided the best model among the four candidate functions (Table 1). The AICc for the step function was 5.0 lower than for the next best fit, the linear function. Relative probabilities were 0.91 for the step function and 0.08 for the linear function. Exponential and power weighting functions performed more poorly, with ΔAICc values of 8.9 and 12.1 and model probabilities of 0.01 and 0.002, respectively. When the two subestuaries with the highest PCB levels were removed, the step function still provided the strongest model, but its AICc differences with the linear, exponential, and power functions were reduced (ΔAICc 1.3, 1.9, and 4.2, respectively, Table S7).

How does weighting affect apparent land use proportions?

Different weighting functions applied in double-weighted models reduced effective source and total areas by differing degrees. Compared to the unweighted total area of 2720 km2, weighted total areas for step, linear, exponential, and power functions were reduced to 750, 594, 486, and 116 km2, respectively, while commercial land area was reduced from the unweighted 118 km2 to 42.3, 33.4, 27.4, and 0.71 km2, respectively. Smaller and smaller fractions of the catchment appeared to contribute to material delivery as the strength of weighting increased, and the contributing fraction of catchment was especially small with inverse distance weighting.

Weighted commercial land cover proportions (G) were higher than unweighted proportions (F) for some catchments and lower for others (Figs. 2, S4; Table S8). All four weighting functions increased G relative to F for watershed 32 and did the opposite for watershed 29. Watershed 3 and 4 showed smaller changes. The remaining watersheds had lower commercial land proportions that were not shifted much by weighting.

Fig. 2
figure 2

Regressions of PCB concentration versus commercial land proportion (blue lines with gray confidence intervals) for unweighted proportions and for weighted proportions from four different distance weighting functions applied with double weighting. Four numbered watersheds are highlighted in red. Grey lines show how distance weighting shifts the commercial land proportions from their unweighted values (gray dots) to distance-weighted values (black or red dots)

Shifts in G relative to F increased the correlation of the dependent variable (PCB concentration) with G compared to the correlation of PCB with F, thus increasing the explanatory power (R2) of weighted compared to unweighted regressions (Fig. 2; Table 1). The increase in G relative to F for watershed 32 had the strongest leverage in increasing the correlation of PCB versus G compared to the correlation of PCB versus F and in yielding a stronger weighted regression (higher R2, lower AICc) with narrower confidence limits compared to the unweighted regression. The decrease in G relative to F for watershed 29 had the next strongest effect, followed by watersheds 4 and 3.

Where in the system do materials originate?

Distance weighting changed where PCBs appeared to originate within the study catchments. The four weighting functions had different effects, and the effects of each function differed among catchments.

All weighting functions increased weighted source proportions (gb) relative to simple proportions (fb) for small distances while decreasing weighted proportions relative to simple proportions at larger distances (Fig. 3). The power function generated the greatest increases in effective proportion at small distances, followed in order by the exponential, linear, and step functions. The same patterns were seen in other measures: material delivery (hb) (Fig. 3), the minimum distance needed to supply certain percentages of material delivery (Table S9), and the cumulative distribution of delivery (Fig. 4). In the unweighted model, land less than 2811 m from the subestuary supplied half of the material delivery across the entire study area, while the step function model indicated land less than 516 m supplied half of material delivery. The distance decreased further to 413, 351, and 101 m for the linear, exponential, and power functions, respectively (Fig. 4, Table S9).

Fig. 3
figure 3

Distance distributions of four measures of material source strength for the aggregate of all study watersheds. The measures are (top) distance-weighted commercial land proportion (gb), (second row) distance-weighted material source strength (hb), (third row) the ratio hb/cb of weighted material delivery to unweighted material delivery (the realized weighting), and (bottom) the difference between distance-weighted material delivery and unweighted material delivery (hb − cb). In the lower two panels, data for distances where weighted delivery is greater than or equal to unweighted delivery (hb ≥ cb) are shown in black, while data for distances where weighted delivery is less than unweighted delivery (hb < cb) are shown in red

Fig. 4
figure 4

Cumulative fraction of total material delivery versus distance from a subestuary for the aggregate of all study watersheds. Cumulative distributions are shown for double-weighted models with four distance weighting functions as well as for the unweighted model

The distributions of material delivery relative to the unweighted model (\({h}_{b}-{c}_{b}\), Eq. 18, Fig. 3) for individual catchments (Fig. 4) also explain how weighting interacted with the distributions of source land and total area to produce shifts in G relative to F (Sect. 3.3). The greatest change with all four weighting functions was a more than threefold increase in G relative to F for watershed 32 (Figs. 2, S4; Table S8). The subestuary of watershed 32 had the highest measured PCB concentration and its distributions of land were unusual (Fig. 1). The distribution of source (commercial) land fb versus distance had a strong peak at small distance (around 300 m), reflecting higher prevalence of source land near the subestuary shorelines than in the aggregate of all the catchments. In addition, the distribution of catchment area in watershed 32 fell off more slowly with distance than in the other watersheds because a greater fraction of total catchment area was further from the subestuary than in the aggregate. Weighting functions that increased effective proportion gb relative to simple proportion fb at small distances strongly increased overall effective proportion G in this catchment where source land was more prevalent near the subestuary. Watershed 32 had very high values of \({g}_{b}-{f}_{b}\) and \({h}_{b}-{c}_{b}\) at small distances with any weighting function (Fig. 5).

Fig. 5
figure 5

Changes in PCB delivery with distance in double-weighted models with four distance weighting functions relative to the unweighted model, calculated as weighted delivery minus unweighted delivery (hb − cb) at each distance. Black bars show how much weighted delivery exceeds unweighted delivery, while red bars show how much weighting reduces delivery relative to the unweighted model. Distributions are shown for the watersheds of two subestuaries and for the aggregate of all the watersheds

Watershed 29 showed the next greatest change in G relative to F, but in the opposite direction: G was reduced to roughly half of F by any of the weighting functions (Fig. 2, Figure S4, Table S8). In watershed 29, the peak of the distribution of commercial land relative to distance was at larger distances, peaking around 2900 m (Fig. 1). Compared to watershed 32, less commercial land occurred close to the subestuary where weighting increased effective proportions gb relative to fb, whereas more occurred farther away where weighting reduced gb relative to fb (Fig. 5). Watershed 32 showed big gains in gb and hb at small distances and minor changes in gb and hb at larger distances. This led to the large increase in G relative to F. In contrast, low fb values close to the subestuary in watershed 29 provided little opportunity for weighting to yield high gb values, but weighting did lead to large reductions in gb relative to fb at larger distances where much commercial land was located (Fig. 5).

Realized weighting

Realized weighting for any of the weighting functions (black lines, Figure S7) emphasized nearby land relative to distant land even more strongly than suggested by the weighting function alone (red lines, Figure S7). Exaggeration of realized weighting relative to the weighting function was greatest for the power function and less for the exponential, linear, and step functions, in that order. Realized weighting also differed among catchments because the realized effect depends on the spatial distributions of both source and total land (Eq. 19), which differed among catchments. The ratio of realized weighting to the weighting function was a distance-independent constant that differed among catchments (factor J in Eq. 19, Table S8).

Discussion

We have presented a comprehensive set of methodological improvements for fitting and interpreting spatially weighted regressions relating land use to aquatic responses. We tested our methods by reanalyzing a published data set to focus on the methodological enhancements rather than on introducing new data. King et al. (2004) had already documented that distance weighting yields strong increases in explanatory power for the test data. Our new methods were able to enhance an already strong published analysis by providing more rigorous answers to the following four questions:

Is there an effect of arrangement?

The first challenge for models incorporating landscape pattern is demonstrating an arrangement effect. Model comparison with AICc demonstrated that even the poorest of the four double-weighted models for the test data has a model probability of 1.0 compared to an unweighted model–rigorous, probabilistic evidence that spatial arrangement is important and should be included in relating land use to the observed PCB concentrations in fish. The previous study reported that distance-weighted models increased R2 relative to unweighted models (King et al. 2004), but that measure alone provides much weaker evidence for an arrangement effect.

What is the best weighting function?

Specifying the weighting function is also an important step in spatially weighted regression. Most studies have just used inverse distance weighting (1/d). King et al. (2004) tested inverse power functions (d−p) with three different p values. Other studies considered additional mathematical functions and estimated weighting parameters with response surfaces created from small sets of possible values (Van Sickle and Johnson 2008; Walsh and Webb 2014). We tested several functions, optimized weighting and regression parameters simultaneously, and compared models with AICc. For the test data, we identified an inverse distance power function with p = 0.80 that gave a stronger weighted model (model probability = 0.92, Table S3) than originally reported with p = 1 (King et al. 2004). Furthermore, exponential, step, and linear weighting functions all produced even stronger models.

A step function gave the best distance-weighted model for the test data. Considering only land use within a fixed distance is a simple idea that is easy to implement with the GIS “buffer” function, making it the most common way to represent a greater impact of nearby land than distant land. However, our results do not support this method because our implementation of the step function has important differences from applying a fixed width buffer. We treated the threshold distance for the step function as a parameter to be estimated from the data rather than an arbitrarily, a priori value; and the step function was judged best only when it excelled compared to other functions. This approach brings the idea of a fixed zone of impact under the more general and rigorous umbrella of spatially weighted regression.

The best weighting function will likely differ among data sets that represent different variables and processes. For some problems, a priori considerations may suggest a particular weighting function. For example, the exponential weighting function can represent a distance effect arising from a first order uptake process while inverse distance models are often used for processes like seed dispersal (Canham and Uriarte 2006). We recommend comparing models with different weighting functions. Better performing models for other published data sets might have been found if more weighting functions and parameter values had been considered, and comparing models can yield insight into likely mechanisms. For future analyses, our general framework for fitting and comparing multiple models should become a standard approach for catchment-scale assessment of land use impacts on aquatic responses. Applying our methods to other data sets will test the generality of our findings with the test data.

How should total land be weighted?

The question of how to weight total land in the denominator of the weighting equation was not thoroughly discussed in previous papers. For the test data, there was very strong support for weighting total land the same way as source land (double weighting) and virtually none for leaving source land unweighted (Table 2). Double weighting also performed better than using different weighting parameters for source and total land (separate weighting) although R2 and AICc values for this comparison were similar. Conceptually, the superiority of double weighting suggests that PCBs from source land and water from all land decline with distance from the estuary in a similar way. The test data results support the common choice of double weighting (as in King et al. 2005, 2004; Van Sickle and Johnson 2008; Peterson et al. 2011; Sheldon et al. 2012; Walsh and Webb 2014). We expect that the best choice will differ among problems and data sets. More studies need to explicitly compare single, double, and separately weighted models to assess the generality of our findings for other problems and processes. Larger data sets could also support fitting more weighting parameters and so provide more scope for separate weighting to yield different results from double weighting. There may also be cases where weighting of numerator and denominator should use different mathematical functions, not just the same function with a different parameter.

Where in the system do materials originate?

Our new interpretative tools reveal how weighting affects the attribution of where effects originate. Comparing the spatial distributions of weighted source proportion (Eq. 14) and weighted material delivery (Eq. 15) to the corresponding distributions for an unweighted model reveals how weighting affects the relative contributions of land near to and more distant from receiving waters (Figs. 3, 5; Table S9). Such comparisons enable us to visualize, quantify, and interpret the impacts of different weighting functions. These analyses also reveal how weighting effects differ among catchments (Figs. 5, S5, S6; Tables S8, S9) because of differences in spatial distributions of source land and total land (Fig. 1). This is a key strength of our approach. Previous studies with distance-weighted land use regressions could not offer such interpretations because they relied only on plots of the weighting function versus distance (e.g., King et al. 2004) or on half decay distances (Van Sickle and Johnson 2008; Walsh and Webb 2014). To correctly quantify the weighting effect, one must integrate effects of applying a weighting function in both the numerator and denominator of the weighting equation as well as the actual distributions of source and total land, as achieved here. Accomplishing this provides enhanced guidance for environmental management. In the test case, the best model suggests that half the PCBs originate within just 0.516 m of a subestuary (Fig. 4, Table S9), so that concentrating management actions to reduce sources within that specific zone would have the maximum benefit.

Generality and applicability

The methodological enhancements provided here make it easier to conceptualize and operationalize models linking landscape pattern to biophysical processes. For simplicity and clarity, our language has focused on the context of materials released from source areas and transported hydrologically to response points in aquatic systems. Studies in this vein have applied distance weighting to understand both water quality and biological responses (e.g., Frimpong et al. 2005; King et al. 2005; Van Sickle and Johnson 2008; Peterson et al. 2011). However, the underlying mathematics are completely general and can be applied to other contexts where patterns affect processes, such as understanding the impacts of proximate physiography on ground water delivery, land cover on animal occupancy or movement within landscape patches, or the influence of surrounding land cover on urban heat (e.g., Baker et al. 2003; Cushman et al. 2008; Cunningham and Johnson 2016; Dixon et al. 2020; Alonzo et al. 2021).

Different transport or movement processes can be represented by choosing different measures of distance. In the test case, we measured distance as the shortest path from each land pixel to the receiving water (Cartesian distance). Other ways of expressing can represent different levels of specificity about effect transmission. Many studies have measured distance along downhill flow paths (e.g., King et al. 2005, 2012; Baker et al. 2006; Walsh and Webb 2014; Weller et al. 2011, 1998; Weller and Baker 2014) or along in-stream flow paths through a stream network (e.g., King et al. 2005; Sheldon et al. 2012; Walsh and Webb 2014). Distance measures can also integrate other hydrological factors, such as local slope, flow path gradient, upslope contributing area (Peterson et al. 2011), or other spatial indices of hydrologic function. Because models are fit by numerical optimization rather than linear regression, our method can also be applied to problems where the underling response model is non-linear.

Our methods can be further extended to questions arising from many disciplines, including ecology, evolution, geography, geomorphology, epidemiology, public policy, and others. For problems in which spatial patterns can affect processes, the impacts of patterns on overall function can now be explicitly tested and quantified. Our methods can relate almost any spatially distributed source or driver to an integrated response at a point or along a boundary; and alternate hypotheses about the effects of pattern or proximity on processes can be tested with alternative weighting functions. As new applications are implemented, additional weighting functions will likely emerge and enhance our general approach. The evolving framework can be applied to inform understanding in a wide range of scientific investigations and applied contexts.