# Inadequacy of Conventional Grab Sampling for Remediation Decision-Making for Metal Contamination at Small-Arms Ranges

## Abstract

Research shows grab sampling is inadequate for evaluating military ranges contaminated with energetics because of their highly heterogeneous distribution. Similar studies assessing the heterogeneous distribution of metals at small-arms ranges (SAR) are lacking. To address this we evaluated whether grab sampling provides appropriate data for performing risk analysis at metal-contaminated SARs characterized with 30–48 grab samples. We evaluated the extractable metal content of Cu, Pb, Sb, and Zn of the field data using a Monte Carlo random resampling with replacement (bootstrapping) simulation approach. Results indicate the 95% confidence interval of the mean for Pb (432 mg/kg) at one site was 200–700 mg/kg with a data range of 5–4500 mg/kg. Considering the U.S. Environmental Protection Agency screening level for lead is 400 mg/kg, the necessity of cleanup at this site is unclear. Resampling based on populations of 7 and 15 samples, a sample size more realistic for the area yielded high false negative rates.

## Keywords

Grab sampling Heterogeneity Metals Residue Small-arms range SoilGrab sampling results from small-arms ranges (SAR) are typically used to describe the spatial and temporal deposition of metal contamination. Grab samples, sometimes referred to as discrete samples; involve the collection of small mass of soil from a single point within the investigation area. Grab sampling has been used to support a variety of Data Quality Objectives. In particular, grab sampling is often used to estimate mean study area contaminant concentrations (e.g., to support “baseline” risk assessments during Remedial Investigations). However, the small sample sizes routinely used in environmental assessments yield highly variable estimates of the mean (ITRC 2012; Clausen and Korte 2009). Work by the USEPA (1995) and van Ee and Blum (1990) suggest the number of grab samples required to yield an accurate representation of the conditions in the field is often several times larger than what environmental practitioners typically collect; though the required number of samples varies depending on the project objectives and available resources.

Over the past decade research shows that military training activities release energetic residues into the environment as particulates, resulting in spatially heterogeneous distributions (Jenkins et al. 2005a, b; Hewitt et al. 2005). Research also demonstrates that when such particulates are present in soils, conventional grab sampling, whether conducted in a biased or random manner or with large sample numbers (USEPA 2002, 1995), does not accurately characterize mass loading (Gy 1992). Accordingly, because soils at military SAR often contain fragments of bullets and cartridge jackets, studies have questioned the efficacy of conventional grab sampling for characterizing this residual metal contamination (Clausen 2015; Clausen et al. 2012). The objectives of the present study were to investigate the nature and extent of metal contamination and the reliability of conventional grab sampling methods for quantifying metal contamination at SARs.

Hadley and Mueller (2012) have argued a grab sample is representative of only a “sampling point,” suggesting that when a high degree of contaminant heterogeneity is present, it is representative of only the aliquot of material analyzed at the laboratory, typically 0.5–2 g for metals. Any inferences, therefore, about the “true” metal concentration in the remaining portion of the soil in the sample container are not reliable.

Consequently, if a single aliquot (subsample) is not representative of an entire sample, then the common practice of inferring contaminant concentrations at adjacent locations (collocated field duplicate samples) from the concentration reported for a single aliquot is flawed (Brewer et al. 2017). Studies conducted for energetic materials released into the environment have shown particulate distribution is random and autocorrelation between sample points is not a valid assumption (Jenkins et al. 1997). Unfortunately, removal of the original sample from the environment makes it impossible to obtain a true field duplicate sample because the sample point no longer exists (Hadley and Mueller 2012). Therefore, adjacent collocated samples or duplicates containing a heterogeneous distribution of particulates are unrelated to the original sample as shown by the non-agreement between the original and duplicate sample results (Brewer et al. 2017; Walsh et al. 2005).

Similarly, the validity of inferring the total sample population based on a few analytical results is questionable (Hadley et al. 2011). Yet, collecting a few grab samples is standard within the environmental industry and has been since the inception of environmental soil sampling (Brewer et al. 2017; USEPA 2000, 1995).

We performed this study to evaluate the viability of grab sampling while maintaining sufficient data quality to estimate the mean contaminant concentrations at SARs. Our approach involved collecting a larger-than-typical number of grab samples (30–48) and resampling of the data hundreds of times using computer simulations (bootstrapping) to evaluate whether such sampling is practical for a military SAR.

## Materials and Methods

This study used several methods to estimate the number of grab samples needed to achieve a representative result. At each range, except Fort Wainwright, the SAR berm dimensions were nominally 3 m high by 100 m long. At Fort Wainwright, the site consisted of 16 individual berms with an approximate total area of 300 m^{2} similar to the three other berms. Collectively, the 16 berms at Fort Wainwright were treated as a single berm for the purposes of this study. The U.S. Environmental Protection Agency (USEPA) recommends collecting 15 grab samples for an area of approximately 300 m^{2} (USEPA 1995). In comparison, using the statistical software program visual sampling plan (VSP) (Matzke et al. 2010) a needed sample size of 10–30 were calculated with selection of 5% tolerances for Type I and Type II errors. Personal observations and discussions with environmental consultants and regulatory officials suggest no consensus on the number of samples needed to characterize a SAR berm. According to Brewer et al. (2017), the inefficiency and unreliability of grab samples cannot be resolved by collecting more grab samples.

Surface soil samples were collected from four locations (1) Range 16 Records Range at Fort Wainwright, Alaska; (2) Range 4–3 at Camp Ethan Allen (CEA), Vermont; (3) Northern Berm at the Kimama Training Site in Kimama, Idaho; and (4) 1000-inch Range at Fort Eustis, Virginia. These four ranges represent a variety of physiographic locations, environmental conditions, surface soil types, munition usage, and length of time since last use. Details about soil sampling methods and range specifics are provided in Clausen et al. (2012, 2013). In brief, grab soil sampling followed procedures typically utilized by the environmental industry such as steel scoops to obtain sufficient material to fill a 4 oz glass container. Co-located grab samples were also obtained from the CEA site to evaluate the small-scale variability of metal concentrations.

Processing of the grab samples followed USEPA Method 3050B (USEPA 1996), which involved simply removing a 1 g subsample from the top of the sample jar, a typical approach employed by commercial environmental laboratories, and performing the acid digestion procedure. No attempts were made to homogenize the sample. Our objectives were to mimic the typical approach used by the environmental industry, recognizing that the grab sample results may not representative of the bulk sample. The U.S. Army Corps of Engineers, Engineer Research and Development Center (ERDC), Environmental Laboratory located in Vicksburg, Mississippi, measured concentrations of Cu, Pb, Sb, and Zn as well as an additional 15 metals using USEPA Method 6010C (USEPA 2006). The instrument used was a Perkin Elmer ELAN 6000 quadrupole inductively coupled plasma mass spectrometer with the factory-supplied Ryton plastic spray chamber and fixed cross-flow nebulizer. Clausen (2015) and Clausen et al. (2013) provide quality assurance and quality control details.

## Results and Discussion

A statistical summary of the results for the four SAR berms indicates the four anthropogenic metals and metalloids (Sb, Cu, Pb, and Zn) consistently exhibit the highest variability of the analytes measured and have positively skewed distributions consistent with their presence as fragments of bullets and bullet jackets (Clausen and Korte 2009). For example, the mean and median Pb grab sample values (432 and 85.7 mg/kg, respectively for Fort Wainwright) as well as the other ranges differ by nearly a factor of five and have similar variances and skewed distributions (Table 1). As discussed in a subsequent paper, Clausen et al. (2017), replicate multi-increment samples were collected from this site and the other three sites and suggest the grab sample results significantly underestimate the mean metal values.

Summary of grab sample results for the four SAR berms sampled

n | Mean | Median | Minimum | Maximum | Variance | Skewness | STDEV | RSD% | |
---|---|---|---|---|---|---|---|---|---|

Range 16 Records Range, Fort Wainwright, AK | |||||||||

Mass (g) | 48 | 159 | 162 | 103 | 197 | 1.E+05 | 6.9 | 19.1 | 12 |

Cu (mg/kg) | 48 | 81.0 | 27.5 | 2.58 | 852 | 3.E+04 | 3.7 | 177 | 218 |

Pb (mg/kg) | 48 | 432 | 85.7 | 5.01 | 4500 | 1.E+06 | 3.3 | 978 | 226 |

Sb (mg/kg) | 8 | 14.0 | 7.41 | 2.07 | 32.6 | 182 | 0.5 | 13.5 | 97 |

Zn (mg/kg) | 48 | 52.6 | 47.9 | 4.88 | 146 | 499 | 3.0 | 22.3 | 42 |

Range 4–3, Camp Ethan Allen, VT | |||||||||

Mass (g) | 30 | 144 | 146 | 122 | 159 | 90 | − 0.8 | 10.1 | 7 |

Cu (mg/kg) | 30 | 300 | 270 | 69.8 | 598 | 2.E+04 | 0.6 | 132 | 44 |

Pb (mg/kg) | 30 | 5060 | 1238 | 43.9 | 79,020 | 2.E+08 | 3.6 | 14,438 | 285 |

Sb (mg/kg) | 30 | 87.8 | 10.0 | 0.898 | 2072 | 1.E+05 | 2.0 | 375 | 427 |

Zn (mg/kg) | 30 | 66.1 | 61.9 | 35.8 | 111 | 306 | 0.6 | 17.6 | 27 |

Kimama TS western berm, Kimama, ID | |||||||||

Mass (g) | 30 | 100 | 99.2 | 66.5 | 135 | 3.E+04 | 3.6 | 16.8 | 17 |

Cu (mg/kg) | 30 | 23.0 | 18.1 | 9.83 | 74.2 | 231 | 1.9 | 15.2 | 66 |

Pb (mg/kg) | 30 | 493 | 73.5 | 11.1 | 9060 | 3.E+06 | 5.3 | 1645 | 334 |

Sb (mg/kg) | 4 | 20 | 3.02 | 2.16 | 70.2 | 1138 | 1.7 | 33.7 | 172 |

Zn (mg/kg) | 30 | 45.4 | 44.9 | 21.8 | 56.2 | 57 | − 1.0 | 6.88 | 15 |

1000-in. Range, Fort Eustis, VA | |||||||||

Mass (g) | 33 | 78.8 | 22.0 | 1.00 | 116 | 4.E+05 | 5.7 | 19.9 | 25 |

Cu (mg/kg) | 33 | 43.3 | 13.2 | 7 | 755 | 2.E+04 | 5.5 | 129 | 298 |

Pb (mg/kg) | 33 | 434 | 94.3 | 17.6 | 8770 | 2.E+06 | 5.5 | 1517 | 350 |

Sb (mg/kg) | 8 | 11.0 | 1.01 | 0.023 | 69.6 | 773 | 2.3 | 24.0 | 219 |

Zn (mg/kg) | 33 | 28.6 | 27.6 | 21.0 | 48.1 | 45 | 1.5 | 6.47 | 23 |

The SAR metals Pb, Sb, Cu, and Zn exhibited the highest RSDs (up to 350%) at each of the four sites (Table 1); RSDs for the other elements, such as Fe, Mn were much smaller, typically less than 30% (Clausen 2015). The high RSDs for the SAR metals indicates anthropogenic input. Evidence for the assumed uncontaminated native metals includes Gaussian distributions with smaller variances (Clausen 2015). The distributions of native metal concentrations are the result of natural soil formation processes, and the observed variability is small relative to the anthropogenic augmented metals. Consistent with the known anthropogenic metals and their mode of distribution into the environment, the analytical results suggest an extremely heterogeneous and random distribution of anthropogenically derived metal. JMP Software version 10 (SAS 2017) was used to calculate descriptive statistics for the metals concentrations. The sample relative standard deviation (RSD) provided a qualitative evaluation of precision. Because the “true” (population) mean metal concentration of each of the berms was unknown, a determination of bias was not strictly possible.

Figure 1 shows the grab sample results of Pb for the four SAR berms. At Fort Wainwright, the highest elevated Pb values appear to occur near the center of the berm, left side at the Kimama Training Site, right side of Eustis Range, and bimodal at CEA. However, these results are not actually representative of the gridded area or the sample location itself as discussed in the following paragraph. Brewer et al. (2017) discusses the erroneous distribution interpretations possible with the random nature of small-scale map patterns generated by discrete sample (1 g sample mass) datasets for sites with a high-degree of small-scale contaminant variability. As discussed later in the paper, the collection of a “co-located” sample a short distance from the initial sample collection point resulted in significantly different metal values. Consequently, the generation of isoconcentration maps provides a false depiction of the actual contaminant distribution. Often times when the number of possible samples are limited, there is a desire to target the collection of data to areas believed to have elevated contaminant concentrations. The lead distribution for the four datasets show a lack of consistency. This suggests that, without additional information (e.g., visible site features associated with potential releases), there is no reason to assume (a priori) that grab sampling can be used to locate the area of elevated contaminant concentrations.

Another difficulty posed by spatially heterogeneous contaminant distributions involves the usefulness of duplicate samples. We compared field duplicate samples (i.e., co-located samples) collected within 0.5 m of each other from Range 4–3 at CEA to assess the small-scale metal variability. The relative percent difference (RPD) is the absolute value of the difference between duplicate results divided by the mean and is a more common measure of precision for environmental chemical analyses than the RSD. For duplicates, the RPD = RSD × 2^{1/2}. The RPDs for the anthropogenic metals are larger than for the native metals by approximately an order of magnitude, demonstrating much larger short-range variability (Table 2; Clausen 2015). For example, the original grab sample obtained from the center of Grid 23, had a Pb value is of 319 mg/kg; while the field duplicate had a value of 479 mg/kg. These results yield two conflicting outcomes for determining whether the concentration of Pb is less or greater than the USEPA screening level of 400 mg/kg (Clausen 2015). In this case if more co-located grab samples were collected, the expectation is that variability of metal concentrations likely would have been significantly greater than what was observed; consistent with our the prior explosive work. For example, Brewer et al. (2017) found the average variability of contaminant concentrations for grab samples collected within a 1 m^{2} area was estimated to be well over an order of magnitude for a similarly heterogeneous site. Table 1. Summary of grab sample results for the four SAR berms sampled.

Comparison of spatially collocated samples from Range 4–3 at CEA, Vermont

Grid ID | Sample | Mass (g) | Cu | Pb | Sb | Zn |
---|---|---|---|---|---|---|

Grid 6 | Orig | 147 | 209 | 692 | 5.26 | 60.6 |

Dup | 147 | 392 | 1851 | 24.1 | 67.0 | |

RPD (%) | 0.1 | 15 | 23 | 32 | 3 | |

Grid 7 | Orig | 152 | 248 | 4858 | 27.8 | 60.7 |

Dup | 154 | 280 | 1650 | 11.9 | 61.2 | |

RPD (%) | 0.4 | 3 | 25 | 20 | 0.2 | |

Grid 15 | Orig | 147 | 361 | 2623 | 23.6 | 74.6 |

Dup | 156 | 270 | 1930 | 16.4 | 63.4 | |

RPD (%) | 2 | 7 | 8 | 9 | 4 | |

Grid 22 | Orig | 147 | 252 | 1204 | 9.89 | 60.4 |

Dup | 150 | 224 | 501 | 3.48 | 57.6 | |

RPD (%) | 0.5 | 3 | 21 | 24 | 1 | |

Grid 23 | Orig | 154 | 163 | 319 | 2.69 | 47.6 |

Dup | 139 | 229 | 479 | 5.68 | 52.1 | |

RPD (%) | 3 | 9 | 10 | 18 | 2 | |

Grid 30 | Orig | 158 | 217 | 951 | 9.26 | 55.7 |

Dup | 147 | 216 | 555 | 4.17 | 55.0 | |

RPD (%) | 2 | 0.2 | 13 | 19 | 0.3 |

The relatively large variances, positively skewed distributions, and inconsistent field duplicate results for the anthropogenic metal concentrations raises a question of whether the quality of the data is adequate to obtain reliable estimates of mean concentrations (e.g., calculations of 95% UCLs of the mean). This consideration is important because a typical goal in environmental investigations is to compare sample maxima or 95% UCLs of the mean (USEPA 2000) with risk-based thresholds, such as the USEPA Regional Screening Levels for soil. Additional remedial action or investigation is necessary when the sample maximum or UCL exceeds the regulatory threshold.

Inferences about sample representativeness involved an evaluation of the reproducibility of the concentration measurements through computer simulations as it was not practical to physically resample the sites multiple times using conventional grab sampling techniques. Computer simulations allow the resampling of the data hundreds of times. Resampling and analysis of the data involved a Bootstrapping technique using sampling without replacement (i.e., once a value from the dataset is selected it cannot be selected again) using Resampling Stats version 4.0 add-on for Excel (Resampling Stats 2013). A 95% upper confidence limit (UCL) was subsequently calculated for reach of the 200 simulated resampled sets of grab samples using ProUCL version 5 (USEPA 2013).

To evaluate how well seven samples could represent the estimated mean for the Fort Wainwright dataset, results were selected at random from the 48 total grab samples by using sampling without replacement 200 times using ProUCL (USEPA 2013). Subsequently a 95% UCL for each of the 200 sets of seven grab samples was calculated. The UCLs for Pb range from 53 to 35,991 mg/kg, an interval spanning nearly three orders of magnitude with a median value of 816 mg/kg. Approximately 30% of the UCLs are less than 400 mg/kg (a common USEPA screening level), about 20% are less than 300 mg/kg, and 14% are less than 200 mg/kg. In contrast, the estimated mean Pb concentration of the 48 samples from Fort Wainwright is 432 mg/kg.

If we assume 432 mg/kg of Pb is approximately equal to the population mean, seven grab samples would result in a false negative 64% of the time. The distribution of UCLs is also positively (right) skewed and exhibits several large outliers (e.g., 35,991 mg/kg Pb with Hall’s Bootstrap). The high degree of sample variability indicates it is not possible to sample this site with seven grab samples and obtain a representative estimate of the mean soil concentration.

ProUCL also provides a recommendation on the optimum UCL calculator based on the total number of samples, population distribution, and number of censored samples (USEPA 2013). The 95% UCLs most commonly recommended by ProUCL were Approximate Gamma, Adjusted Gamma, Student’s *t*, and Chebyshev [mean, standard deviation (Sd)] with others selected less than 6% (Table 3). For the calculation methods commonly selected, the Adjusted Gamma and Chebyshev tended to yield the largest UCLs. The median UCL for both of these methods was approximately 6000 mg/kg Pb. The Student’s t method produced the smallest UCLs where the median was roughly 200 mg/kg Pb. The Approximate Gamma tended to be several times larger than the Student’s *t* UCLs where the median was about 600 mg/kg. The disparity in results is attributable to how the UCL calculators accommodate data variance and skewness due to contaminant heterogeneity.

Descriptive statistics for the Pb 95% UCLs calculated by different methods

Method | n | Mean | STD | Min | Median | Max |
---|---|---|---|---|---|---|

Adjusted Gamma U | 48 | 6169 | 2611 | 691 | 6356 | 12,983 |

Approximate Gamma | 83 | 837 | 812 | 152 | 621 | 5241 |

Chebyshev Mean Sd | 6 | 1116 | 856 | 448 | 747 | 2767 |

H-UCL | 1 | 607 | NA | 607 | 607 | 607 |

Hall’s Bootstrap | 5 | 18,561 | 12,512 | 1773 | 17,805 | 35,991 |

Student’s | 37 | 221 | 147 | 53 | 170 | 642 |

Chebyshev | 20 | 5455 | 1718 | 1613 | 5928 | 7268 |

The combined effect of sample result uncertainty caused by contaminant heterogeneity and uncertainties associated with the UCL calculators compounds the problem of obtaining a reliable estimate of the mean for a SAR. To further assess the impact of a small sample set on uncertainty associated with the mean, a bootstrap method was applied to the set of 48 grab samples to calculate a confidence limit of the population mean. The simulated UCLs were then compared with the confidence interval (CI) for the population mean to evaluate bias (i.e., in terms of the number of times the simulated UCLs fall outside the CI, thus, over- or underestimating the population mean). Seven results were randomly selected from the set of 48 grab samples by using sampling with replacement 10,000 times (i.e., each concentration could be selected more than once) to calculate a non-parametric bootstrap confidence limit of the population mean. A one-sided 95% UCL of the mean yielded 679 mg/kg and a two-sided 95% CI ranged from 193 to 736 mg/kg. Consequently, the Pb population mean is unlikely to be less than 200 or greater than 700 mg/kg, an assumption used to evaluate the simulated UCLs. The UCL simulations overestimated the mean 55% of the time (i.e., 55% of the 200 UCLs exceed 700 mg/kg). About 15% of the UCLs are at least one order of magnitude larger than 700 mg/kg; and 14% of the UCLs are less than 200 mg/kg. Therefore, 69% of the time, the sets of seven-grab samples yield an UCL biased either high or low relative to the population mean; only 31% of the UCLs from the simulation are between 200 and 700 mg/kg. Clearly, seven grab samples are insufficient to provide a representative and reliable population estimate for the area of interest. Therefore, we conducted additional resampling simulations to estimate, if possible, how many more samples are needed to obtain results representative of the mean of the SAR.

Using the Fort Wainwright Pb data, we performed resampling simulations by selecting *m* (sample size) = 5, 7, 10, 15, 20, 25, 30, and 35 randomly from the set of 48 grab results. For each value of *m*, we repeated the process 300×. We then calculated the mean of *m* for each of the 300 repetitions and plotted them against *m* to assess qualitatively the variability of the means as a function of the sample size *m*. As shown by the conical pattern for the plotted values in Fig. 2, the variability is large for small values of *m* and decreases as *m* increases (e.g., as expected from the Central Limit Theorem). The values predominantly fall within the 95% CI of the mean (200–700 mg/kg) when the sample size *m* is at least 15–30. However, this range in simulation values (200–700 mg/kg) is sufficiently large that making reliable risk decisions is not possible. This becomes particular problematic if the criteria for an action is within this range such as the Pb USEPA action level of 400 mg/kg.

In addition, the frequency of values less than the USEPA screening level of 400 mg/kg increases as *m* decreases. If we assume the population mean is larger than 400 mg/kg (e.g., as the mean Pb value of the 48 grabs is estimated to be 432 mg/kg), it seems likely sample sizes less than 15 will produce relatively large false negative rates. As mentioned earlier, additional sampling of this area with alternative methods indicates that the Pb mean of 432 mg/kg significantly underestimates the actual Pb concentration at the site (Clausen et al. 2017). An increasing value of *m* resulted in lower error covariance, approximately 50% at *m* = 35, whereas lower values of *m* had significantly higher covariance error, roughly 200% at *m* = 10. Values of *m* < 35 yielded non-normal skewed distributions, making it difficult to represent the mean Pb concentrations in surface soils. Clearly, a representative mean value for Pb is not possible with collection of less than 35 grab samples from this site. Simulations performed with the contaminant metals (Cu, Sb, and Zn) yielded similar observations, as did simulations performed with all four contaminants at the other three sites. Consequently, the simulations suggest the apparent necessity of collecting more than 35 grab samples to reduce the decision error for a 300-m^{2} site, which is three to five times greater than the number of samples often collected, to estimate the mean anthropogenic metal concentrations in soils at a SAR. However, as discussed in Clausen et al. (2017) the grab sample data grossly underestimates the mean Pb concentration in these soils as compared with replicate multi-increment samples collected from the same site. It is not clear if increasing the number of grab samples beyond 35 (e.g. > 100) would result in improved data quality as we were limited to the physically dataset of 48 grab samples. Our expectation is if we had collected a larger dataset off grab samples, (e.g. > 100), the simulation results in Fig. 1 would have been similar (i.e., the error decreases with increasing number of samples). However, in practice it seems unlikely the sample sizes needed would be adequate to achieve an acceptable level of uncertainty. Clearly, the collection of 100 grab samples from a 300 m^{2} area would be impractical for most investigations.

In conclusion, many environmental characterization studies on SAR use grab samples to characterize presence of metals such as Cu, Pb, Zn, and Sb. However, studies have not looked at the impact of contaminant heterogeneity on interpretation of grab-sample results when making inferences about the mean metal concentrations in soils. Environmental studies using grab sampling are commonly driven by cost rather than by data-quality considerations. The results of our study suggest less than 35 grab samples are inadequate to characterize 300 m^{2} SAR berms containing heterogeneous distributions of metallic residues to adequately represent the population and yield acceptable precision. However, our simulations do not provide enough information how many more than 35 grab samples are needed to yield a representative sample of the sites as we were limited to a value less than 48. Sample theory suggests in the situation of a heterogeneous distribution of metals at a small arms range, it is not possible to obtain a representative mean result using grab samples unless a clear site enumeration is conducted (i.e., every discrete location is sampled). The poor reproducibility for collocated field duplicate grab samples also supports a large degree of uncertainty indicative of a highly heterogeneous distribution of anthropogenic metals on SAR. Finally, the added uncertainty of the UCL calculator methods combined with the contaminant heterogeneity suggests the consistency of calculating 95% UCL using grab sample results is unpredictable. Situations where a heterogeneous distribution of metallic residues occurs will likely need a different sampling approach (e.g., composite or incremental sampling).

## Notes

### Acknowledgements

The authors would like to acknowledge the Department of Defense, Environmental Science and Technology Certification Program (ESTCP), which provided financial support for the conduct of the research to Dr. Clausen through the ESTCP ER-0918 project, *Demonstration of the Attributes of Multi-Increment Sampling and Proper Sample Processing Protocols for the Characterization of Metals on DoD Facilities*. ESTCP provided a review of the study design and the final report (Clausen et al. 2013). ESTCP had no involvement in writing this paper. The research presented in this paper is from a thesis submitted to the Graduate School at the University of New Hampshire as part of the requirements for completion of a doctoral degree (Clausen 2015).

## References

- Brewer R, Peard J, Heskett M (2017) A critical review of discrete soil sample data. reliability: part 1—field study results. Soil Sed Contam 1549–7887. https://doi.org/10.1080/15320383.2017.1244171
- Clausen JL (2015) Sampling of soils with metallic residues collected from military small-arms ranges. Dissertation, University of New Hampshire, Durham, NH. https://search.proquest.com/docview/1697333143?pq-origsite=gscholar
- Clausen J, Korte K (2009) The distribution of metals in soils and pore water at three U.S. military training facilities. Soil Sed Contam 18(5):546–563Google Scholar
- Clausen J, Georgian T, Richardson J, Bednar A, Perron N, Penfold L, Anderson D, Gooch G, Hall T, Butterfield E (2012) Evaluation of sampling and sample preparation modifications for soil containing metal residues. ERDC TR-12-1. U.S. Army Corps of Engineers, Engineer Research and Development Center, Hanover, NH. http://acwc.sdp.sirsi.net/client/search/asset:asset?t:ac=$N/1006020
- Clausen JL, Georgian T, Bednar A, Perron N, Bray A, Tuminello P, Gooch G, Mulherin N, Gelvin A, Beede M, Saari S, Jones W, Tazik S (2013) Demonstration of incremental sampling methodology for soil containing metallic residues. ERDC/CRREL TR-13-9. US Army Corps of Engineers, Engineer Research and Development Center, Cold Regions Research and Engineering Laboratory, Hanover, NH http://acwc.sdp.sirsi.net/client/search/asset/1030080
- Clausen JL, Georgian T, Gardner KH, Douglas A (2017) Applying incremental sampling methodology to soils containing heterogeneously distributed metallic residues to improve risk analysis. Bull Environ Contam Toxicol. https://doi.org/10.1007/s00128-017-2252-x
- Gy PM (1992) Sampling of heterogeneous and dynamic material systems. Elsevier Scientific Publishing Company, New YorkGoogle Scholar
- Hadley PW, Mueller SD (2012) Evaluating “hot spots” of soil contamination (Redux). Soil Sed Contam 21:335–350. https://doi.org/10.1080/15320383.2012.664431 Google Scholar
- Hadley PW, Crapps E, Hewitt AD (2011) Time for a change of scene. Env Forensics 12:312–318. https://doi.org/10.1080/15275922.2011.622344 CrossRefGoogle Scholar
- Hewitt A, Jenkins T, Ramsey C, Bjella K, Ranney T, Perron N (2005) Estimating energetic residue loading on military artillery ranges: large decision units. ERDC/CRREL TR-05-7. U.S. Army Corps of Engineers, Engineer Research and Development Center, Cold Regions Research and Engineering Laboratory, Hanover, NH. http://www.dtic.mil/get-tr-doc/pdf?AD=ADA434241
- ITRC (2012) Technical and regulatory guidance: Incremental sampling methodology. ISM-1. Interstate Technology and Regulatory Council, Incremental Sampling Methodology Team, Washington, DCGoogle Scholar
- Jenkins T, Grant C, Brar G, Thorne P, Schumacher P, Ranney T (1997) Sampling error associated with collection and analysis of soil Samples at TNT contaminated sites. Field Anal Chem Tech 1:151–163CrossRefGoogle Scholar
- Jenkins T, Hewitt A, Walsh M, Ranney T, Ramsey C, Grant C, Bjella K (2005a) Representative sampling for energetic compounds at military training ranges. Env Forensics 6:45–55CrossRefGoogle Scholar
- Jenkins T, Thiboutot S, Ampleman G, Hewitt A, Walsh ME, Ranney T, Ramsey C, Gran C, Collins C, Brochu S, Bigl S, Pennington J (2005b) Identity and distribution of residues of energetic compounds at military live-fire training ranges. ERDC-TR-05-10. U.S. Army Corps of Engineers, Engineer Research and Development Center, Hanover, NH. http://www.dtic.mil/get-tr-doc/pdf?AD=ADA441160
- Matzke BD, Nuffer LL, Hathaway JE, Sego LH, Pulsipher BA, McKenna S, Wilson JE, Dowson ST, Hassig NL, Murray CJ, Roberts B (2010) Visual Sample Plan version 6.0 user’s guide. PNNL-19915. Pacific Northwest National Laboratory, RichlandGoogle Scholar
- Resampling Stats (2013) Statistics.com. Arlington, VA http://www.resample.com
- SAS (2017) JMP Software version 10. CaryGoogle Scholar
- USEPA (1995) Superfund program representative sampling guidance, volume 1: Soil. EPA/540/R-95/141. U.S. Environmental Protection Agency, Office of Solid Waste and Emergency Response, Washington, DCGoogle Scholar
- USEPA (1996) Method 3050B: acid digestion of sediments, sludges, and soils. Test methods for evaluating solid waste, physical/chemical methods. SW-846. U.S. Environmental Protection Agency, Office of Solid Waste and Emergency Response, Washington, DC http://www.epa.gov/osw/hazrad/testmethods/sw846/pdfs/3050b.pdf
- USEPA (2000) USEPA environmental response team, Standard operating procedures for soil sampling. U.S. Environmental Protection Agency. Washington, DC. http://www.epa.gov/region6/qa/qadevtools/mod5_sops/soil_sampling/ertsop2012-soil.pdf
- USEPA (2002) Guidance of choosing a sampling design for environmental data collection for use in developing a quality assurance project plan. EPA/240/R-02/005. U.S. Environmental Protection Agency, Washington, DCGoogle Scholar
- USEPA (2006) Method 6010C: inductively coupled plasma-atomic emission spectrometry. Test methods for evaluating solid waste, physical/chemical methods. SW-846. Office of Solid Waste and Emergency Response, U.S. Environmental Protection Agency, Washington, DCGoogle Scholar
- USEPA (2013) ProUCL version 5.0.0 user guide, statistical software for environmental applications for data sets with and without nondetect observations. EPA/600/R-07/41. U.S. Environmental Protection Agency, Washington, DCGoogle Scholar
- van Ee J, Blum L (1990) A rational for the assessment of errors in the sampling of soils. USEPA/600/4/-90/013. U.S. Environmental Protection Agency, Environmental Monitoring Systems Laboratory, Las VegasGoogle Scholar
- Walsh ME, Ramsey CA, Collins CM, Hewitt AD, Walsh MR, Bjella K, Lambert D, Perron N (2005) Collection methods and laboratory processing of samples from Donnelly Training Area Firing Points Alaska 2003. ERDC/CRREL TR-05-6. U.S. Army Corps of Engineers, Engineer Research and Development Center, Cold Regions Research and Engineering Laboratory, Hanover, NH. http://www.crrel.usace.army.mil/techpub/CRREL_Reports/reports/TR05-6.pdf