The effect of sampling effort on spatial autocorrelation in macrobenthic intertidal invertebrates
The importance of sampling effort in the statistical exploration of spatial autocorrelation is demonstrated for benthic macroinvertebrate assemblages within the intertidal warm-temperate Knysna estuary, South Africa. While the role of spatial scale in determining autocorrelation patterns in ecological populations has been noted, the effects of changing sampling effort (e.g., sample size) have rarely been explored; neither have the nature of any changes with sample size. Invertebrate assemblages were sampled from a single grid lattice comprised of 48 sampling stations at four sample sizes (0.0015, 0.0026, 0.0054 and 0.01 m2). Four metrics were investigated: assemblage abundance, frequency (species density), and numbers of the two most abundant species in the area Simplisetia erythraeensis and Prionospio sexoculata. Spatial autocorrelation was estimated for each sample size from the global Moran’s I. For a range of distance classes, Moran’s I correlograms were constructed, these plotted autocorrelation estimates as a function of the separation distance between point samples. Spatial autocorrelation was present in three of the metrics (assemblage abundance frequency and Prionospio abundance), but not for Simplisetia abundance. The estimated magnitude of spatial autocorrelation varied across sampling units for all four assemblage and species metrics (global Moran’s I ranged from 0.5 to − 0.07). Correlograms indicated that optimal sampling interval distances fell in the region of 8 m for Simplisetia and 19 m for the remaining three metrics. These distances indicate the dimensions of the processes (both biotic and abiotic) that determine spatial patterning in the microbenthic intertidal invertebrates sampled.
KeywordsSampling effort Sample size Moran’s I correlogram Moran’s I
Spatial dependence is a fundamental property of most ecological datasets that arises because of Tobler’s First Law of Geography: ‘everything is related to everything else, but near things are more related than distant things’ (Tobler, 1970). In ecology, positive spatial dependence means that geographically nearby communities tend to be similar. In most coastal environments, moderately positive spatial dependence can be observed because nearby communities are more alike than communities in general. This is the case, for example, in the macrofaunal abundance and species density of benthic assemblages sampled across a seagrass to sand interface (Barnes and Hamylton, 2013), as well as species and functional groups of macroinvertebrates within intertidal seagrass habitats (Barnes & Hamylton, 2016). Negative spatial dependence arises when nearby communities are less alike than communities are in general, it is less common in coastal environments, but may be seen where competition occurs for resources.
Spatial autocorrelation provides a statistical measurement of spatial dependence. Autocorrelation is a statistically quantifiable property that expresses the idea of near things being more alike than far things. It can be loosely defined as the coincidence of value similarity with locational similarity (Cliff & Ord, 1981). Like spatial dependency, spatial autocorrelation arises because of a pervasive continuity and structure to the real world, in which things rarely change dramatically over short distances. Spatial autocorrelation uses statistical correlation to demonstrate that communities close together in geographical space tend to be more alike than communities that are widely separated. Such an approach has been used to quantify the spatial structure of grazer and microalgal populations of an intertidal sandflat (Pinckney & Sandulli, 1990), and determine territorial patch sizes for copepods (Fleecer et al., 1990; Sandulli & Pinckney, 1999).
Beyond characterising spatial patterns, there are two primary reasons to measure spatial autocorrelation. First, spatial dependency introduces a departure from the independent observations assumption of classical statistics. Accordingly, the measurement of autocorrelation reveals the nature and degree to which samples co-vary over space. With respect to sampling of ecological communities, spatial autocorrelation signifies the presence of, and quantifies the extent of, redundant information in data. In the case of a two-dimensional sampling grid lattice, sample points closer together will likely observe ecological communities that are more similar than those that are further away. This will give rise to low variation in sampled communities, incorporating statistically redundant information, as well as wasting field effort. This was recognised in a study that employed measures of autocorrelation to characterise clustering of abalone, sea cucumbers and urchins as an indication of the precision level necessary for diver surveys (McGarvey et al. 2010).
A second reason to measure spatial autocorrelation is so that it can be incorporated into an explanatory or predictive model, often leading to an improvement in model performance. Cressie & Cassie (1993) note that explicitly accounting for spatial autocorrelation tends to increase the percentage of observed variation in a dependent variable explained by a model. This has been demonstrated in spatial models that assess the role of environmental variables (sediment grain size and inundation time per 12.25 h tidal cycle) in structuring patchiness of intertidal macrobenthic invertebrates in sand flats in the Wadden Sea, The Netherlands (Kraan et al., 2009, 2010). Similarly, the incorporation of an autocorrelated error term into a spatial model strengthened the link between benthic community characteristics (abundance, species number, richness, diversity) and environmental characteristics in St Anns Bay, Nova Scotia, Canada (Dowd et al., 2014). The inference employed in such models is based on statistical evidence and reasoning, which relies on assumptions about the data they use. One common assumption states that model error terms are independent and identically distributed. The presence of spatial autocorrelation in the residuals of a regression model is an indication that this assumption has been violated, which may lead to incorrect statistical inference (e.g., positive spatial autocorrelation results in an increased tendency to reject the null hypothesis when it is true). This provides a compelling statistical reason for measuring autocorrelation.
While spatial autocorrelation has been widely reported in ecology (Legendre & Fortin, 1989; Levin, 1992; Legendre, 1993), relatively few studies have focused attention on the influence of community sampling effort on autocorrelation estimates. This is in spite of several practical benefits that arise from doing so, including the ability to empirically define optimal sampling interval distances that will yield a dataset that can support statistically valid inference.
It is known that soft bottom macroinvertebrate assemblages are not randomly distributed in space; rather, characteristics such as the number of individuals, species density and species diversity are spatially structured across intertidal environments. This is particularly the case in transitional zones such as the seagrass to sand interface, giving rise to local patterns in autocorrelation measures such as the Geary’s C statistic (Barnes and Hamylton, 2013; Barnes & Hamylton, 2016). It is also known that abundance and occupancy characteristics of benthic invertebrate communities are scale-dependent in the sense that they are commonly calculated for a given sampling area. It follows, therefore, that associated estimates of spatial autocorrelation of both individual species and assemblage abundances are also dependent on scale.
In the context of sampling, the notion of ‘scale’ can be subdivided into three components: grain size (the size of the sampling units), extent (the total length, area or volume included in the study) and interval (the average distance between samples) (Legendre & Legendre, 2012). Several studies have shown that the size of the area investigated influences the spatial statistical properties of sample data collected for vegetation indices derived from remotely sensed data and landscape patterning (Jelinski & Wu, 1996; Qi & Wu, 1996; Dungan et al., 2002; Dale & Fortin, 2014).
Explore the influence of sampling effort (sample size) on global estimates of autocorrelation, and
Investigate the nature of any variation in autocorrelation estimates with changes to the size of unit sample using a Moran’s I correlogram function to describe spatial structure across paired samples of varying distances apart.
Invertebrate assemblages were explored at four different ‘sampling efforts’, herein referred to as sample sizes (0.0015, 0.0026 and 0.0054 and 0.01 m2). These were selected as they were ecologically meaningful in relation to geographical scale of the feeding behaviour and mobility of the species present (Dauer, 1985). Four assemblage and species metrics were calculated from these samples for investigation: assemblage abundance, assemblage frequency (species density), and numbers of the two most abundant species in the area, the polychaetes Simplisetia erythraeensis and Prionospio sexoculata, herein referred to simply as Simplisetia and Prionospio (Barnes and Barnes, 2014). Together these species comprised > 45% of total numbers, each occurring in 94% of stations. For each of these different sampling unit sizes and metrics, we estimated spatial autocorrelation from Moran’s I. Moran’s I correlograms were plotted to compare local autocorrelation estimates, both across the sample sizes and across varying distance classes that incorporated different numbers of samples within the grid.
We hypothesise that estimates of spatial autocorrelation will be dependent on sample size because the processes that give rise to the autocorrelation operate at different spatial scales. For example, the microphagous Prionospio worm lives in a small tube and is relatively immobile. Estimates of spatial autocorrelation made at larger sample sizes are, therefore, more likely to detect processes operating over broader geographic areas that determine the initial siting of their tubes, such as the availability of particulate food for suspension feeding. However, these sample sizes would be too large to capture meaningful spatial variation in the context of Prionospio behaviour.
Materials and methods
Study area and sampling protocol
All samples were collected immediately after tidal ebb from the area of shore concerned, and were gently sieved through 710 µm mesh on site (Barnes, 2016). Retained material from each core (i) was placed in a large polythene bag of seawater within which seagrass material was shaken vigorously to dislodge all but sessile animals and then discarded; (ii) was then re-sieved and transported immediately to a local field laboratory, and (iii) was there placed in a 30 × 25 cm white tray where the living fauna was visually examined. All samples were searched through methodically to ensure that the complete fauna contained within them were examined; each sample examination was only considered complete when no further animals could be seen after searching for 3 min. Faunal individuals were identified to species level and were counted. Four assemblage and species-level metrics were calculated at each sample size for each station for use in the analysis. These were: the numbers of individuals, pooled for all species (‘abundance’), the numbers of species (‘frequency’) and the numbers of the two most abundant species in the area, Simplisetia and Prionospio. Sessile and mobile species can show different spatial patterning (Davidson et al., 2004), and this study excluded any mobile or semi-sessile animals (e.g., Siphonaria compressa) that had become detached from the seagrass leaves during sampling. All nomenclature is given as listed in the World Register of Marine Species (WoRMS, www.marinespecies.org; accessed March 2016).
Spatial autocorrelation was estimated for samples at both a local and a global scale. The local analysis subdivided data cases into a subset of local samples from the grid according to their position relative to the location for which autocorrelation was to be estimated. The global scale analysis simultaneously analysed all samples together to produce summary estimates of spatial autocorrelation across the entire sampled grid.
Estimation of global autocorrelation: Moran’s I
The global Moran’s I statistic estimates spatial autocorrelation based on the cross-product, which measures the covariance of the values for a given variable at locations i and j. It does this iteratively for every pair of points within the grid lattice by finding the difference between each value and the overall (i.e., total sample grid) mean value. The product of these two differences is then calculated and summed across all point pairs (n) within the grid.
This produces a corresponding value of the global Moran’s I statistic that ranges from − 1 (negative autocorrelation) to 1 (positive autocorrelation), with anything above 0.4 considered to be strongly autocorrelated (Griffith, 1987). The level of significance of the result was determined using a randomisation approach calculated on the basis of a reference distribution formed by 999 permutations of spatially random layouts using the same data values. This provided a baseline against which the significance of observed and expected Moran’s I values could be compared. A pseudo-significance level (P value) was generated as the ratio of the number of statistics for the randomly generated data sets that are equal to or exceed the observed statistic plus one, over the number of permutations used plus one (Anselin, 1995).
Exploring the structure of spatial autocorrelation: Moran’s I correlograms
A Moran’s I correlogram was generated that plotted calculated values for the Moran’s I coefficients (y-axis) against measurement distance classes. For each distance class, a P value was calculated to test whether the departure of the calculated value for Moran’s I was significantly different to that expected. The shape of the Moran’s I correlogram, specifically the divergence of Moran’s I values between the different sample sizes over the distance classes, can be used to assess the variation in spatial autocorrelation calculated at different sample sizes. Values were, therefore, plotted for each of the four sample sizes. A Moran’s I correlogram was employed as this has been found to provide a more effective description of ecological spatial dependence in non-isotropic assemblage distributions (i.e., those with a non-constant mean and variance across space) (Rossi et al., 1992).
The problem of multiple testing occurs when several tests of significance are carried out on the same dataset simultaneously and the probability of a type I error becomes larger than the nominal value α (Legendre & Legendre, 2012). To correct for this, probability values were adjusted upwards before comparison to the unadjusted α significance level, in line with Holm’s (1979) procedure.
Estimation of global autocorrelation: Moran’s I
Estimation and exploration of local autocorrelation: Moran’s I correlograms
Spatial autocorrelation patterns observed from visual inspection of the graduated symbol maps were also evident in the numerical global estimates of Moran’s I. Positive, statistically significant autocorrelation was evident in three of the four metrics tested (abundance, frequency and Prionospio abundance). Simplisetia abundance did not show a discernible visual pattern and it was estimated to be weakly negatively autocorrelated, which was not statistically significant. While Simplisetia and Prionospio both inhabit small tubes, they may exhibit different autocorrelation characteristics because Prionospio lives permanently inside its tube, from which its head and ciliated tentacles protrude for filter feeding on diatoms (Dauer, 1985). In contrast, Simplisetia are free to move about. By analogy with the presumably ecologically equivalent Hediste diversicolor, they are likely to be highly generalist feeders, preying on a variety of small animals, fine seaweeds (biting through and collecting animals or algae with their powerful jaws), mud (extracting detrital materials) and carrion. Although Simplisetia may perform a range of different feeding strategies, it likely adopts a hierarchy of food preference to maximise net energy gain as does H. diversicolor (Pashley, 1985; Riisgård, 1991).
Both the global estimates of Moran’s I and the Moran’s I correlograms indicated that the magnitude of estimated spatial autocorrelation varied across the different sample sizes. Such a finding has been noted for vegetation communities (Fortin, 1999), landscape patterns (Qi & Wu, 1996) and also for inshore seagrass plant communities (Yamakita & Nakaoka, 2009, 2011). This is the first time, to the best of the authors’ knowledge, that this variability across sample sizes has been recorded for benthic macroinvertebrates. This demonstrates the importance of considering sampling parameters in the study of autocorrelation in benthic macroinvertebrate assemblages, particularly for the identification of dimensions and causes of spatial patterning in invertebrate populations. Indeed, where autocorrelation itself is of interest, it may be prudent to estimate this at multiple sample sizes (scales) to account for this variability.
Variation in spatial autocorrelation estimates across the different sample sizes was greater for the species-level analysis than the assemblage analysis. This was perhaps because of a loss of variability among the individual population members in the aggregated assemblage metrics, as has been noted elsewhere in ecological datasets (Dale & Fortin, 2014). Core sample size thus had a greater influence on the spatial autocorrelation estimates for individual species. Indeed, for Simplisetia, the local Moran’s I calculated within a distance of 15 m was slightly positive for the moderate sample sizes, and negative for the maximum and minimum core sizes (ranging from − 0.34 at the 0.0054 m2 core size to 0.07 at the 0.0026 m2 core size).
The Moran’s I correlograms for abundance, frequency and Prionospio abundance indicated that autocorrelation approached zero around a geographical range of 19 m, while the Simplisetia correlogram approaches a Moran’s I of zero at a distance of around 8 m. This indicated variation in the dimensions of the spatial processes underlying autocorrelation for the two separate species tested, which was consistent across all four sample sizes. Absence of spatial autocorrelation beyond this geographical range would explain the low magnitude and lack of consistency in global spatial autocorrelation estimates, both across the sample sizes and across all four of the assemblage and species metrics. Such a distance goes well beyond the physical mobility range of these individual polychaetes, suggesting that their distribution is shaped by a property of the collective assemblage, rather than by individuals’ behaviour. An optimal sampling interval distance would, therefore, incorporate a spatial lag (a minimum sampling distance interval) of 19 m for Prionospio and 8 m for Simplisetia. This lag would avoid information redundancy and, where statistical inference is to be drawn from the data, it would be supported by a truly independent set of observations (Hamylton, 2013).
The Moran’s I correlograms revealed greater variability between autocorrelation estimates for the different sample sizes at smaller distance intervals (Fig. 4). Similar patterns have been observed in Mantel correlograms plotted across broadly comparable distance ranges (0–15 m) for invertebrates on embayed sandy beaches around Sydney, Australia (Cooke et al., 2014). These were driven by environmental characteristics including sediment grain size, skewness and calcium carbonate content. Environmental gradients that could influence community distribution across larger distances include hydrological dynamics across the tidal interface (Barnes & Ellwood, 2011; Gingold et al., 2011), and salinity gradients along the estuarine axis (Barnes & Ellwood, 2012). Alternatively, biotic factors, such as competition, food supply and trophic interactions may be important (Alongi & Tietjen, 1980; Pinckney & Sandulli, 1990; Snelgrove et al., 1994). The variability of autocorrelation estimates in seagrass beds has also been found to reduce as the separation distance between points increases with a marked reduction in autocorrelation at a separation distance of 4 m, which coincided with the typical dimensions of a seagrass patch (Yamakita & Nakaoka, 2009).
Further studies could investigate correlograms of organisms grouped by functional traits, which may shed light on spatial patterning arising through behavioural modes (Rodil et al., 2014). Alternatively, the exploration of indices that incorporate both occupancy and abundance of species within assemblages, such as the index of numerical importance proposed by Barnes (2014) might reveal more about the nature of the processes underlying spatial patterning across spatial scales and sample sizes.
In summary, numerical estimates of spatial autocorrelation for both assemblage and species level metrics in macrobenthic invertebrate assemblages are considerably influenced by the sample size employed. Variability between estimates of spatial autocorrelation was greater for species-level, as opposed to assemblage-level metrics, and at the local scale, i.e., across small distances (< 19 m for assemblage abundance, frequency and Prionospio and < 8 m for Simplisetia). These findings invite greater consideration of the sample size employed when sampling macroinvertebrate populations, highlighting a need to understand the spatial dimensions of relevant environmental determinants and species behaviours before an appropriate sampling protocol can be designed.
RSKB is grateful to: the Smuts Memorial Fund, managed by the University of Cambridge in memory of Jan Christiaan Smuts, and Rhodes University Research Committee for financial support of the fieldwork; and the Rondevlei Scientific Services Offices of SANParks and the Knysna Area Manager, Johan de Klerk, for permission to undertake research in the Knysna Section of the Garden Route National Park.
- Alongi, D. M. & J. H. Tietjen, 1980. Population growth and trophic interactions among freeliving marine nematodes. In Tenore, K. R. & B. C. Coull (eds.), Marine Benthic Dynamics. University of South Carolina Press, Columbia, SC: 151–166.Google Scholar
- Cliff, A. D. & J. K. Ord, 1981. Spatial Processes: Models & Applications, Vol. 44. Pion, London.Google Scholar
- Cressie, N. A. & N. A. Cassie, 1993. Statistics for spatial data, Vol. 900. Wiley, New York.Google Scholar
- Griffith, D. A., 1987. Spatial Autocorrelation. A Primer. Association of American Geographers, Washington DC.Google Scholar
- Holm, S., 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6: 65–70.Google Scholar
- Legendre, P. & L. F. Legendre, 2012. Numerical Ecology, Vol. 24. Elsevier, Amsterdam.Google Scholar
- Lewis III, F. G. & A. W. Stoner, 1981. An examination of methods for sampling macrobenthos in seagrass meadows. Bulletin of Marine Science 31(1): 116–124.Google Scholar
- McGarvey, R., J. E. Feenstra, S. Mayfield & E. V. Sautter, 2010. A diver survey method to quantify the clustering of sedentary invertebrates by the scale of spatial autocorrelation. Marine and Freshwater Research, 61(2): 153–162.Google Scholar
- Pashley, H., 1985. Feeding and Optimization: The Foraging Behaviour of Nereis diversicolor (Polychaeta). University of Cambridge, Cambridge.Google Scholar