Background

Ecological processes are spatially influenced on various scales, ranging from global to local scales [1],[2]. In natural communities, the observed spatial pattern is the result of environmental, biological and/or historical drivers [3], which are not exclusive but rather complementary. The existence of spatial structures of species assemblages suggests the influence of at least one structuring factor: i) a spatially distributed environment is the driving force that structures species assemblages according to niche theory [4]; ii) species are assembled on certain spatial scales through the influence of biotic interactions [5]-[10]; and iii) historical contingency, according to neutral theory [10],[11], or stochastic variations in the history of species arrival [12],[13] drive this process, although the scale of the random effect has not been fully identified [14]. It is challenging to determine which process has a larger effect because historical species arrival data and past ecological processes are usually unknown.

When analyzing spatial datasets, striking and puzzling results are found if spatial autocorrelation is ignored because response variables are structured on various spatial scales [15]-[18]. Specific spatially explicit sampling protocols for targeted organisms and different approaches are needed in soil ecology studies [17],[19],[20], although these methods must be used with caution [16],[21]. Geostatistics [22] allows the assessment of the spatial distribution of soil environmental variability and soil organisms [22],[23], but other powerful statistical tools are necessary to model spatial structures on various scales, such as principal coordinates of neighbor matrices (PCNM) [3],[24],[25]. The PCNM approach is part of the distance-based Moran’s eigenvector map (MEM) analysis, which is included in the spatial eigenfunction family of tools [2],[25],[26] and is a powerful statistical method to model spatial structures at all scales; in other words, the environmental variability is linked to community structure on a multi-scale level [3],[24] to obtain new ecological insights [21]. It has also been used to test and separate the niche from neutral mechanisms that influence the community structure [15], although it may appear over-simplistic [27],[28].

To date, few field studies have been performed on the assembly of soil invertebrate communities to infer overall patterns and draw conclusions on the importance of explicitly accounting for multi-spatial scales. Soil organism communities have been reported to be spatially structured due to their response to spatial variability in soil resources [12],[19],[29]-[33], allowing the co-existence of competing species within the same patch in spatially heterogeneous environments [32],[34]. Although complex spatial patterns have been described for soil invertebrates forming patch assemblages that range from the scale of soil aggregates [35] to those of individual plants [36], agricultural lands and natural ecosystems [37]-[42], no study has assessed the multi-scale spatial relationship between soil invertebrates and environmental variability thus far. The influence of disturbance and habitat heterogeneity on Carabidae assemblages has been described recently, but only on the landscape scale [19]. Studies and data analysis using these multi-spatial analysis techniques to perform invertebrate community research are needed, even if caution must also be exercised [43]. In this study, we aimed to i) analyze the spatial location of significant patches and gaps of the species assemblages identified, ii) test whether the relationship between species assemblages and soil environmental variability occurs on very fine (<10 m), fine (10–20 m), and medium scales (>30 m), and iii) investigate the spatial relationship between root traits and soil parameters to test the hypothesis of the “single tree effect” [44].

Results

Earthworm abundance and soil environmental heterogeneity

A total of 688 earthworms were collected and included seven species (Table 1) with three main ecological categories present [45]: epigeics (litter feeders), Aymara sp. and one unclassified species (new genus 1); endogeics (soil feeders), Andiodrilus sp., Andiorrhinus sp., Glossodrilus sp., and one unclassified species (new genus 2); and anecics (soil + litter feeders), Martiodrilus sp.

Table 1 Earthworm abundance and main morphological traits

The CA extracted three axes (72.9% of the total variance), and these three axes were used to discriminate among the various species assemblages according to the axis selected (Figure 1). Axis I (34.2% of the explained variance) discriminated new genus 1 versus all other species, whereas the second axis (21.7% of the total variance) revealed a clear distinction between endogeic and epigeic + anecic species. Moreover, the position of species along the positive side of axis 2 followed a body size increase among endogeic species. Axis 3 (17.1% of the total variance) separated Aymara, Andiodrilus and new genus 1 from Martiodrilus, Glossodrilus and new genus 2.

Figure 1
figure 1

Ordination plot of species in the factorial plan following correspondence analysis of earthworm density (N m −2) in the gallery forest: (a), axes 1 and 2; (b), axes 2 and 3; and (c) and (d), “eigenvalues”. The species Andiorrhinus was not included in the analysis because it only represented 1% of the total earthworm abundance.

Patches and gaps of species assemblages

The SADIE spatial Ia index and local vi and vj clustering indices were statistically significant for endogeic species and the group Martiodrilus, Glossodrilus and new genus 2 (1 anecic +2 endogeics), whereas only the vj index was significant for Andiodrilus, Aymara and new genus 1, i.e. one endogeic + two epigeics (Table 2). Significant spatial dissociations were found when using those assemblages identified with CA axes, i.e., −0.232 (p = 0.978) between new genus 1 and the rest of species, −0.278 (p = 0.995) between endogeics and epigecis + anecic group, and −0.383 (p = 0.999) between the group Andiodrilus, Aymara and new genus 1 from Martiodrilus, Glossodrilus and new genus 2 group.

Table 2 SADIE aggregation indices and associated p levels for the various combinations of earthworm assemblages identified in the three axes extracted from the CA

The number of significant clusters of the earthworm assemblages ranged from 1 (new genus 1) to 9 (endogeics), with gaps occupying a larger area than that of patches (Figure 2). The type of litter and tree root traits may influence the patchy distribution of endogeic earthworms, which is known as the “single tree effect” [44]. The endogeic species assemblage was close to A. maripa trees, except for the large patch at the central part of the surveyed plot, where values of the coarse root length and weight (CoRL, CoRW) were the lowest (Figure 3, kriged contour maps).

Figure 2
figure 2

Overlaid contour and classed post maps (surfer) of SADIE clustering indices for counts of the species assemblages identified in the CA. Index values > −1.5 represent significant gaps (blue shading and darker blue dots), and index values >1.5 indicate significant patches (red shading and darker red dots). Black dots indicate units for which clustering exceeds expectation, although not significantly (>2 or < −1). Open dots indicate clustering below expectation (<1 or > −1).

Figure 3
figure 3

Correlogram computed using the factorial coordinates for the corresponding positive (□) and negative (Δ) row scores of the three axes extracted from the CA depicting the spatial autocorrelation of a) assemblages CA1+ (New genus 1) and CA1- (rest of species), b) CA2+ (endogeics) and CA2- (epigeics + anecic), and c) CA3+ ( Andiodrilus , Avmara , new genus 1) and CA3- ( Martiodrilus , Glossodrilus and new genus 2). Black symbols refer to the lag distances at which the Moran’s I coefficients were significant after progressive Bonferroni corrected p-values (p = 0.05/12; p’ = 0.0042). Only the correlograms of the CA2+ and CA3- assemblages were globally significant.

The identity and location of tree species within the surveyed plot did not appear to explain the observed spatial patterns of the remaining species assemblages:

  • ■ new genus 1: a significant gap in the lower half area where multiple tree species, mainly A. maripa, were present. The correlogram was not significant (Figure 4a).

  • ■ All other species: significant gaps and patches were not linked to areas of tree presence. The correlogram was not significant (Figure 4a).

  • ■ Endogeics group: four significant patches close to the location where the A. maripa tree species was observed. The correlogram was significant (Figure 4b).

  • ■ epigeics?+?anecic group: same as described for the rest of species (CA1-). The correlogram was not significant (Figure 4b).

  • Andiodrilus, Aymara and new genus 1: two significant gaps in the lower half area and two significant patches in the plot edge where the A. maripa tree species was found. The correlogram was not significant (Figure 4c).

  • Martiodrilus, Glossodrilus and new genus 2 group: one significant patch in the western zone of the plot where trees were present and a large significant gap in the upper part; another two significant patches were in the eastern zone. The correlogram was significant (Figure 4c).

Figure 4
figure 4

Kriged maps of root-related variables (log-transformed values): FiRL (a) and CoRL (c), length of fine and coarse roots, respectively (m sample −1), and FiRW (b) and CoRW (d), weight of fine and coarse roots, respectively (g dry weight sample −1). Darker areas correspond to lower values.

Cross-correlogram analysis

Regarding the spatial cross-correlation between root-related variables and soil nutrient-related and physical variables, significant positive cross-correlations were identified at short lag distances (h) between the FiRL and SOC0–5, P0–5 and C:N5–10 (Table 3), whereas P5–10 showed a significant negative cross-correlation (Monte Carlo permutation). With regard to the CoRL, the cross-correlation at short distances was positive for SOC5–10 and C:N5–10 and negative for N5–10 and P5–10. Regarding root biomass, the FiRW showed significant positive spatial cross-correlation with P0-5 and P5–10, whereas it was negative for the variables SOC5–10, C:N0–5 and C:N5–10 at short lag distances (Table 3). The CoRW showed a positive spatial relationship with SOC5–10 and C:N5–10 and a negative relationship with P5–10. A significant positive spatial cross-correlation was observed between the FiRL and soil aggregates of less than 5 mm in size, whereas it was negative for larger soil aggregates and moisture at short lag distances (Table 4). Regarding the FiRW, a significant negative spatial cross-correlation at short distances was especially observed for <0.5 and 2–5 mm size soil aggregates, and a positive cross-correlation was observed for >10 mm aggregates and bulk density (BD). Finally, the CoRL showed a positive spatial cross-correlation with 0.125-0.5 and 1–5 mm soil aggregates and hydraulic conductivity, and a negative spatial cross-correlation was observed for >10 mm soil aggregates and soil moisture. The CoRW showed a positive spatial cross-correlation with 0.125-0.25 and 0.5-5 mm size soil aggregates and a negative spatial cross-correlation for >10 mm soil aggregates (Table 4).

Table 3 Cross-correlograms of the root- and nutrient-related soil variables (significant Bonferroni corrected two-sided p-values (0.05/11 = 0.0045) for each distance class were tested using 999 permutations under the null hypothesis)
Table 4 Cross-correlograms of the root- and soil physical variables (significant Bonferroni corrected two-sided p-values (0.05/11 = 0.0045) for each distance class were tested using 999 permutations under the null hypothesis)

Decomposing multiple scale spatial patterns of species assemblages

Significant multi-scale spatial structures were obtained for the earthworm community, species and assemblages, especially in the case of new genus 1. The forward selection procedure resulted in various numbers of PCNM variables, ranging from 1 to 9 out of 69 positively autocorrelated spatial eigenvectors (significant Moran’s I at p < 0.05). Variogram modeling [25] provided information on the scales of PCNM variables. The PCNM eigenfunctions selected to model the distribution of earthworm community are depicted in the Additional file 1.

These parameters indicate clear spatial structures on a medium (>30 m), fine (10–20 m) and very fine scale (<10 m), except for Andiodrilus (which presented only one significant PCNM). Regarding new genus 1, PCNM3 and PCNM8 defined the medium-scale patterns, whereas PCNM12 and PCNM16 encompassed the fine-scale patterns; PCNM29, PCNM33 and PCNM51 described very fine scales (Additional file 1). The maps of the fitted scores of the significant canonical axes in the PCNM analysis for species (A), species assemblages and the whole community (B) are depicted in Additional file 2.

The significant explanatory environmental variables that best described the multi-spatial structure varied for the earthworm community, species and species assemblages (Table 5). The nutrient-related variables explained much of the structure of new genus 1 on the medium and fine scales, whereas the physical variables were better explained on a very fine scale, such as soil compaction (negatively) and humidity (positively). The variables C0–5 (pcorr < 0.001) and moisture content (pcorr < 0.05) contributed positively to the spatial structure model of new genus 1, whereas C5–10 (pcorr < 0.05), N0–5 (pcorr < 0.001), C:N0–5 (p < 0.01) and compaction (pcorr < 0.05) contributed negatively to medium-scale patterns (Table 5).

Table 5 Significant positive/negative relationship between the spatial characteristics of earthworm species and the soil environmental variables measured in this study

With regard to the endogeic Andiodrilus sp., the medium-scale spatial structure was explained by physical variables associated with the size of soil aggregates, although the values were not significant (pcorr > 0.05). When species assemblages were used instead, new environmental variables were detected (Table 5), i.e., variable P0–5 was negatively correlated to the medium-scale pattern of assemblage of endogeics, and litter contributed negatively to this pattern. For the epigeics and anecic assemblage, litter and moisture contributed positively to the medium-scale spatial structure model, although this contribution was not significant (pcorr > 0.05).

Soil environmental control on earthworm species and assemblage spatial patterns

The variation partitioning analysis revealed differences among species regarding the explanatory variables (Table 6). The entire set of environmental and spatial variables explained the various percentages of variation within the community, species and species assemblage. In the case of the earthworm community, the explained variation was 41.9%, of which 32.3% was explained by the soil environment but not the spatial variables (p = 0.005). The environment and fine-scale structure explained 4.98% of the total variation, whereas the environment and medium-scale structure together explained 2.93%. For the species alone, the Ra2 coefficient for the environmental fraction ranged from 1% for Aymara to 48.0% for new genus 1. The medium and fine spatial scales explained 15.4% and 13.4% for Aymara and 2% and 2.2% for new genus 1, respectively. The amount of variation explained only by spatial variables independent of the environment differed among species; it ranged from 1.3% to 28.8% for new genus 2 and Aymara, respectively (Table 6).

Table 6 Significant PCNM variables (spatial models with eigenfunctions associated with a positive Moran's I) for medium, fine and very fine spatial scales and results of the variation partitioning analysis using adjusted R coefficients (R a 2 ), i.e., the amount of variance explained by the environment, the spatial scales and residuals

Discussion

Both spatial and environmental variables structured the species, assemblages and earthworm community, although variations were found in the explained contribution of environmental factors, i.e., 33.3% of the total variation of the global spatial structure of the earthworm community was explained by soil environmental variability. The specific soil environmental variables that were significantly linked to particular spatial scales for species and assemblages were also observed in other studies of nematodes in a forest [35]. To a certain extent, our results agree with Hutchinson’s environmental control model [4], although a large portion of the variation was also linked to a purely spatial component (Table 6).

The selected PCNM variables highlighted significant spatial patterns in earthworm assemblages from a gallery forest, allowing us to identify the spatial scale at which the earthworm community was structured. In general, the very fine scale of autocorrelation detected in our study represents spatial patterns of less than 10 m (PCNMs 33, 51; Figure S1 in Additional file 1), fine scales depicted patterns of 15–20 m (PCNMs 12), and medium scales (PCNMs 3, 5, 8) represented spatial patterns of >30 m (see details in Additional file 1). Furthermore, our observation of very fine-, fine- and medium-scale spatial relationships indicates the importance of considering multiple scales during ecological studies of soil organisms. In our study, we carefully chose the scale used to sample earthworms to focus on small-scale patterns. Additional studies are needed to increase the scale of the sampling design, i.e., hundreds of meters to several kilometres.

The influence of environmental constraints on the spatial distribution of species assemblages has previously been demonstrated in a nearby savanna [29]. Moreover, earthworm activity also contributes to soil heterogeneity [33],[37]. The joint influence of the soil environment and species-created heterogeneity, i.e., the so-called “functional domain” [47] of soil parameters, could explain the spatial patterns observed on several scales. On very fine scales, the environmental variables associated with the spatial distribution of earthworms were more difficult to detect, i.e., the concentration of soil C0–5 and moisture better explained the spatial pattern of new genus 1 on fine and medium scales compared with very fine scales, whereas Andiodrilus sp. was mostly associated with physical variables, such as size class aggregated distributions, because this medium-size species produces compact casts that influence the surrounding soil environment [32]. Very fine scales (PCNMs 32 and 50) may be overlooked by classical multivariate analyses, as their relationships may be masked by those of other explanatory variables associated with larger scales, such as PCNM 3. As a detailed analysis of the soil environmental variables was performed within a relatively small area, some of the variation could be attributed to unmeasured variables, leading to incomplete predictions [48]. Moreover, the fraction attributed solely to space was smaller than all other fractions, except for Aymara (28.8%), Glossodrilus (14.1%), and assemblages epigeics + anecic and Andiodrilus, Aymara and new genus 1, which represented 23.5% and 22.2% of the total variance, respectively.

The factors affecting the spatial distribution of soil organisms at larger scales include gradients in soil organic matter and vegetation structure [49], whereas at very fine scales (<10 m) earthworm spatial distribution could be influenced by local factors, such as the plant characteristics, soil moisture and micro-topography. A variety of plant species is likely to support important levels of soil heterogeneity [50]. The root architecture or fine-scale spatial patterns within the plant community determine the spatial structures of earthworm populations through fine-scale soil environmental variations, which is known as the “single-tree effect” [44]. Through litter input and root leachates, trees directly influence earthworm populations. They also indirectly influence earthworm populations by altering the soil properties, forming patches beneath tree canopies that influence community structure and ecosystem function [51],[52]. Species exclusion was also reported for the savanna grass Imperata brasiliensis, as earthworms become injured upon contact with its sharp-pointed roots [38]. In our study, significant positive cross-correlations were found for the CoRL and CoRW and the soil nutrient- and physical-related variables. These close associations between soil variables and vegetation structure have also been described for epigeic invertebrate assemblages [19]. Regarding the importance of soil environmental variables, it should be noted that the influential factors according to our analyses did not operate independently but interacted (e.g., root traits and nutrients); these complex interactions are characteristic of most ecological systems [53].

The idea that species distributions can be linked to key abiotic variables on multiple scales is not new [54]. Our analysis for empirical data has shown that environmental variables are indeed most important on broad scales, whereas purely spatial patterns appear to dominate on finer scales [55]. Applying PCNM analysis toward large-scale assessment of species-environment relationships is a well-established method [19],[56],[57]. Gilbert and Bennett [43] and Smith and Lundholm [28] criticized the application of variation partitioning to study the relationship between environmental variables and space, although they admitted that it yields useful results when carefully used. Our present analysis supports an optimistic view of this approach. All environmental patterns are spatially correlated on some scale [28], and all ecological processes are spatial to some extent [1]. The common fraction of variance explained jointly by environment and space appears to represent patterns generated by both environmental factors and the limitations on species dispersal [28]. In our study, the highest level of pure spatial variation was obtained for the epigeics + anecic assemblage (23.5%), whereas the endogeics assemblage showed the lowest level (1.6%, Table 6). Species, or even earthworm ecological categories, show specific dispersal behaviors [58], with endogeics typically showing less dispersal than do epigeic and anecic species.

We carefully selected the sampling scale and the spatial statistics tools to address the ecological question at hand [43]. We benefited from our previous knowledge regarding the biology and ecology of the species found in the region where the survey was undertaken [29],[32],[33],[37],[38]. Our study lays the groundwork for further detailed analysis of spatial structuring environmental factors and species assemblages on several scales, while also providing clues for developing an accurate and spatially explicit sampling design for earthworm communities. In addition, the utility of the tools used to select species assemblages and analyze their spatial attributes relative to the soil environmental variability was clearly demonstrated. We are confident that our results provide crucial insight into the spatial relationship between species assemblages and soil environmental variability on scales that range several tens of meters. The selection of assemblages from the correspondence analysis, in this case epigeics versus endogeics, and the statistical methods used to draw our conclusions provide insights that improve understanding regarding why particular species assemblages are found at particular sites. From an ecological point of view, our study not only suggests that specific environmental factors determine the structure and spatial distribution of earthworms in the gallery forest but also indicates that a large proportion of unexplained variation exists. Whether this variation is the result of unmeasured soil environmental variables or null model (random) patterns is a subject for further research.

Conclusions

Earthworms were spatially structured within a relatively small but highly heterogeneous plot; i.e., even at ranges of just a few meters, a multi-scale spatial pattern was observed. The amount of variation jointly explained by the environment and space was not high. However, these sources of variation should not be neglected because they represent unmeasured soil environmental factors and processes that limit species dispersal. Further studies are needed because dispersal traits, for example, remain largely unknown in many earthworm communities. In conclusion, specific abiotic factors were responsible for the observed patterns, and the importance of these patterns needs to be elucidated, even if the multi-scale approach carries additional difficulties and caveats when interpreting the results.

Methods

Study area

Sampling was conducted in a gallery forest (GF) at the CORPOICA-CIAT Carimagua field research station in the Eastern Plains of Colombia (4° 37’ N, 71° 19’ W, 170 m a.s.l.). The study area is a well-drained savanna forming a young alluvial plain consisting of Pleistocene and Holocene sediments of Andean origin. The terrain is characterized by open herbaceous savannas where GFs follow a dense braided drainage network of rivers toward the Orinoco catchment. The yearly average temperature and precipitation are 26°C (iso-hyperthermy) and 2,200 mm, respectively, with clayey Oxisols of low pH (4.2-4.4 in water) and fertility, with low available P (1–2 ppm Bray II) and Al saturation >90% (CIAT data).

The plant community of the GF is characterized by several tree species, including Dendropanax arboreus (L.) Decne. & Planch. (1854) (Araliaceae), Enterolobium spp. (Leguminosae), Ficus spp. (Moraceae), Jacaranda copaia (Aubl.) (Bignoniaceae), Copernicia tectorum (Kunth) Mart. (Caesalpiniaceae), and Cecropia sp. (Cecropiaceae). Other species include Mauritia flexuosa L.f. and Mauritiella (Palmaceae), Attalea maripa (Palmaceae), Nectandra membranacea (Sw.) Griseb. (Lauraceae), Didymopanax morototoni (Aubl.) Decne. & Planch. (Araliaceae), Virola sp. (Myristicaceae), and Hymenaea courbaril L. (Caesalpiniaceae) [59].

Earthworms and soil sampling

Earthworms and soil samples were collected at 100 sampling points evenly distributed within a 45 × 45 m2 grid with 5 m of inter-sample distance (Figure 5). The earthworms were identified, and their abundance was counted in situ from soil blocks of 25 × 25 cm2 and a depth of 20 cm [60]. Previously, the fresh, tower-like casts deposited on the soil surface by Martiodrilus sp. (anecic) were counted at each point within 1 m2 quadrats, as they are reliable indicators of the number of active individuals [61].

Figure 5
figure 5

Sampling protocol used with a regular grid of 10 x 10 sampling points and a 5 m inter-sample distance. A total of 400 soil samples were collected for the various soil analyses and 100 soil monoliths for earthworm species counts. The location of tree species (>5 m diameter at breast height, DBH) within the surveyed 0.2 ha plot are shown together with the soil pit where the earthworms were sampled, identified and counted, as well as the four soil cores taken for physical and chemical determinations from each of the 100 sampling points.

In total, 400 soil samples were collected for physical and chemical analyses. Soil cores were collected along the four sides of each sampling point (Figure 5). The core method (5 cm depth and 5 cm diam. metal cylinder) was used for bulk density (soil dry mass per volume) following [62]. Water content (soil water per volume and soil water per dry mass) was determined gravimetrically, and hydric conductivity and susceptibility to compaction were also measured [63].

A second core (10 cm depth and 5 cm diam. metal cylinder) was used for soil organic carbon (SOC), nitrogen (N) and phosphorous (P) measurements at 0–5 and 5–10 cm. The soil was then oven dried at 75°C for 48 h and finely ground (<200 μm). A standard colorimetric method was used after digestion in H2SO4 to measure SOC, and the Kjeldahl method was used to assess the total N. Available P was determined using the Bray-II extraction method. C:N and C:P ratios were calculated as the SOC concentration divided by the total N and P concentrations, respectively.

Another soil core (15 cm depth and 10 cm diam.) was taken to assess the size-class aggregate distribution; ca. 100 g of air-dried soil was mechanically shaken in a sieve column of 4.75, 2, 1, 0.5 and 0.250 mm for 30 min. The last soil core (15 cm depth and 10 cm diam.) was used for root length and biomass quantification. Soil was washed in the lab and sieved to separate the fine (<2 mm) and coarse roots (>2 mm) and then oven dried at 105°C for 48 h.

Resistance to penetration (RP) was measured (3 replicates) using a hand penetrometer at each sampling point. The soil moisture content (volumetric) of the topsoil at the time of sampling was ca. 38% (pF = 2.8).

Multivariate ordination analysis (CA)

Species abundance (raw data) was analyzed using correspondence analysis (CA). When the species abundance was <5% of the total, it was removed from the species matrix. The extracted factorial axes allowed us to identify various species assemblages, i.e. these were defined according to the sum of the individuals of all species linked to positive or negative row scores of the three axes.

Species assemblage patches and gaps

SADIE (Spatial Analysis Distance IndicEs) analysis [64],[65] was used to assess the presence of significant patches and gaps within species assemblages. The index uses count data, i.e., the total number of individuals who corresponded to any of the assemblages identified per sampling point. A global index of aggregation (Ia) is computed:

I a =D/Ea,

where D is the distance moved to achieve the regular pattern for the observed data and Ea is the arithmetic mean distance to regularity for non-regular randomized samples [64].

Ia equals 1 for a random distribution, whereas it is >1 or <1 for either a clumped (aggregated) or regular spatial pattern, respectively [65].

SADIE identifies clusters of high (patches) and low (gaps) mean density, respectively, and these clusters are categorized as vi (positive) and vj (negative cluster index). A patch or gap comprises at least one sample location where the cluster index (vi or vj) is significant at the heuristic threshold of 1.5 and −1.5, respectively. Adjacent sample locations with significant index values form a single cluster [65]. The observed vi or vj indices are tested using random permutations against the H0 of complete spatial independence of counts [66].

In this study, we used positive and negative row scores extracted from the CA to obtain count data and compute the SADIE vi and vj cluster indices. Factorial coordinates have been used as a typical procedure to analyze the inner structures of data matrices for community analysis [29],[33],[37]-[39],[67],[68]. Because the row scores and factorial coordinates are not count data, which is a requisite for applying SADIE statistics, the various assemblages were obtained by summing the earthworm count data linked to the positive and negative row scores along the CA axes.

Finally, a spatial association/dissociation index was computed between species assemblage pairs [66]. The local association indices calculated from their individual sampling-unit clustering indices are correlated between species assemblage pairs. The observed value of the association index is tested against the H0 of complete spatial independence of the counts (based on random permutations). The two-tailed associated probability levels at α = 5% are <0.025 and >0.975 for significant association and dissociation, respectively [66].

Spatial autocorrelation analysis

In the presence of a spatial dependence, the observation made at one location is more similar to observations made at nearby sites [2], breaking the rule of sample independence for statistical analyses [23]. To assess the degree of spatial autocorrelation, the (semi)-variogram is a function that describes the spatial pattern of any variable with increasing inter-sample (lag) distance. When positive autocorrelation exists, the semi-variance γ(h) increases until it reaches a maximum value (the “sill”) for a given lag distance, which is referred to as the range. This parameter defines the limit of spatial dependence of the variable concerned (detailed in [22],[69]). Estimated values of γ(h) are adjusted using a theoretical model [70],[71] that is later applied with an interpolation technique called “kriging” to estimate values of the variable under study at non-sampled sites [22]. In our study, interpolated maps were used for root-related variables using only the modeled parameters obtained in the variogram [33].

Cross-variograms can be calculated to assess how two variables co-vary in space [72]. Similar to univariate variograms, cross-covariances may be computed using the values of two distinct variables observed at locations separated by lag h [73]. However, variograms and cross-variograms are not associated with formal testing for departures from randomness (an r2 correlation coefficient could be used to adjust the curve). However, in our study, spatial cross-correlation among the root-related variables, soil nutrient contents and physical variables was assessed by calculating the spatial cross-correlogram [55],[74]. A spatial autocorrelation coefficient, named Moran’s I, is plotted in the correlogram for increasing distance classes [75]. Data were allocated to 11 distance classes with a minimum of 50 pairs of points for each distance class to compute the cross-correlogram. The significance of the correlogram is tested with a Monte Carlo simulation [20]; it is significant when at least one coefficient is lower than the Bonferroni corrected p′ of α′ = α/k for the k distance classes used [76]. Data normality was tested with a Kolmogorov–Smirnov test; when the normality assumption was not confirmed, a Box–Cox transformation was used [77]. The gstat and ncf packages of the R program 2.15.1 [78] were used to compute the variograms and cross-correlograms and to depict the kriged maps.

Principal coordinates of neighbor matrices (PCNM) and variation partitioning

The multi-scale spatial analysis of fauna data and soil environmental variability was performed using PCNM analysis [24],[79]. This method allows to capture extremely complex structures [80] and is based on the principal coordinate analysis (PCoA) of a truncated pairwise geographic distance matrix between sampling sites [25]. It creates PCNM variables (spatial predictors or eigenfunctions) and a spectral decomposition of spatial relationships from broad to fine spatial scales [81] that is encompassed by the data matrix among sampled sites and then determines to which PCNM variables the response data (uni- or multivariate) respond statistically [79]. Only spatial eigenfunctions associated with positive eigenvalues based on Moran’s I were used to define the spatial structures [3], which represents a highly conservative method due to its penalization of degrees of freedom and adjusted R2 statistics [43].

The PCNM variables that significantly contribute toward explaining the species response data are grouped into a small number of submodels, whereas they are normally assigned to broad, intermediate, and fine scales. The predicted values generated for each submodel can then be reanalyzed using canonical analysis against environmental variables to identify the environmental variables associated with species distributions on the scale represented by each submodel [25]. The forward-selection procedure was used [82] to reduce Type I error, as it is known to underestimate the residual variance [83]. In other words, the probability of selecting at least one PCNM is greater than the chosen significance level, even if the response variable is not spatially structured [80]. Appropriate and rigorous approaches for submodel selection have been argued by [43] and [81] is support of improving the methodological developments in MEM-based methods. Scale is generally defined according to the main features of the sampling design, such as the extent of the study area or the size and spacing of the sampling units [84]. Given the dimensions of the plot (45 × 45 m) and our knowledge on the spatial distribution of various earthworm species in the area [33],[37],[38], the scales were grouped for convenience into medium (>30 m), fine (10–20 m) and very fine (<10 m).

The next step is to relate the spatial components using significant Bonferroni-adjusted p values extracted from the species matrix with soil variables. In other words, species-soil environment regression analysis is performed independently on each scale identified by the PCNM variables. Variogram analysis for each PCNM variable is performed to identify the spatial scale at which the relationship was significant. The lowest value of the Akaike information criterion (AIC) identifies the best spatial model.

Partitioning the variation of a species response data table among two or more explanatory tables is performed using multivariate variation partitioning [85], which determines how much of the species variation is spatially structured and associated with the measured environmental variables [57],[80]. The variation partitioning analysis is based on the adjusted R2 statistic Ra2 [86], and patterns on finer scales identified by the PCNM variables appear smoother compared to other spatially explicit models, such as nested variograms and filter kriging [79].

The data matrix included count data for 6 earthworm species, 23 soil environmental variables (Additional files 3 and 4) and xy coordinates for 100 sampling points. It has been recommended that fauna data should be detrended and transformed during PCNM analysis. Contrary to correspondence analysis, earthworm abundance data were Hellinger transformed because PCNM has been found to be inappropriate for raw data that includes many null abundances [87]. Earthworm spatial distribution is represented by patches, and clearly meaningful trends are rarely observed; consequently, the data were not detrended. In other words, we were cautious with the use of tests to determine the presence of trends because they are likely not appropriate for patchy patterns. The packages vegan, mass and packfor were used for all the calculations needed during PCNM analysis in R 2.15.1 [78].

Adjustment of the probability level

The α < 0.05 probability level was corrected using the false discovery rate (FDR) procedure for multiple comparisons [88], in which the power of multiple tests is optimized while controlling for the proportion of significant results that might be Type I errors. The p values from the individual tests are used to perform the corrections and search for significant differences at the corrected probability level. The comparison starts with the highest p value obtained from the individual tests, then each value is checked until encountering the first value that meets the requirement, i.e., the highest p value that is smaller than the corrected p [89]. The transformations include the following:

p(i) ≤ (α/m)*i , where m is the number of tests (variables) and i is the test (variable) ranked in ascending order, i.e., p(1) ≤ ….. ≤ p(m). The final p value corresponded to the following correction:

p corr = 0.05 × number of variables /rankedp maximum

During PCNM analysis, three tests were performed that corresponded to the three spatial scales used: the medium, fine and very fine scales.

Additional files