Challenges in the experimental design of agroforestry research sites

Temperate agroforestry comprises silvoarable and silvopastoral systems that combine management of trees with cultivation of arable crops and/or permanent grasslands with or without livestock. Agroforestry systems are more complex in comparison to other agricultural land-use systems such as monocropping systems or annual monocultures because of biological interactions between their components (Scherr 1991; Jacobs et al. 2022). In addition, each component, i.e., the trees and crops or grassland, requires its own management which accounts for agricultural cycles that take place at different spatiotemporal scales (Scherr 1991).

This inherent complexity hinders the application of traditional statistical methods in the analysis of agroforestry experiments (Balandier and Dupraz 1998; Birteeb et al. 2020) as many of the available statistical tools were developed for the analysis of bio-physico-chemical properties in agricultural experiments involving annual crops (Verdooren 2020). In case of the latter, the number of plots (including replications and control plots) is determined by the number of treatments and treatment combinations, the variance of the target parameters and the required level of statistical significance and power (Kumle et al. 2021; Piepho et al. 2022). In case of the former, the plot size and distance must be sufficient to account for intra-system interactions and to avoid confounding effects of neighboring treatments.

Trees have been found to influence neighboring treatments above- and belowground (Somarriba et al. 2001) with microclimate effects, e.g., changes in wind speeds, reaching over distances of up to 30 times the tree height (Böhm et al. 2014). Furthermore, long time periods are required to accommodate management activities (Balandier and Dupraz 1998; Lovell et al. 2018). Considering the limited resources in terms of land, labor, and funds; designing an agroforestry experiment, with enough replications or control treatments to allow for a robust statistical analysis, might not be feasible (Stamps and Linit 1998).

Hence, an alternative experimental design option involves investigating one or several individual agroforestry systems which are subsequently compared against an adjacent non-agroforestry system that serves as a control. The control site is expected to match the soil and microclimate conditions which can be unrealistic because of differences in topography, soil conditions and management history (Balandier and Dupraz 1998). When a paired experimental design is employed i.e., several pairs of sites are available, the analysis can be straightforward and involve e.g., a paired t-test. However, when data is collected within a single agroforestry site, there is no true replication and subsequent treatment randomization, which are prerequisites for agricultural experiments (Piepho et al. 2013).

Furthermore, to study the interactions between trees and crops or grassland within a single agroforestry system, the sampling strategy typically involves a point-transect design, where samples are collected in the tree row and at defined distances from the tree trunk in the arable or grassland strip. This approach leads to a hierarchical data structure characterized by a spatial autocorrelation within and between transects and to pseudo-replication if samples collected at different distances are treated as replicates (Stamps and Linit 1998). Thus, the assumptions for some of the commonly applied statistical tests such as analysis of variance (ANOVA) are violated, rendering the data analysis and subsequent interpretation invalid.

In such cases, multilevel models offer a practical alternative. Multilevel models such as marginal models (MM), generalized linear and linear mixed models (GLMM and LMM), and generalized additive mixed models (GAMM) constitute a family of models which are an extension to linear regression that can be used to correct for spatial and temporal autocorrelation (Zuur et al. 2009; Muff et al. 2016). Multilevel models have been successfully applied in ecology (Fitts et al. 2021; Meeussen et al. 2021), agriculture (Maaz et al. 2021; Kumar et al. 2021) and forestry (Hall and Bailey 2001; Chen et al. 2019). Here, we highlight their potential in agroforestry research through a comprehensive introduction to relevant literature.

To illustrate the key concepts presented in this Short Communication, we devised an easy-to-follow R script to be used alongside the crop yield data collected at our case-study agroforestry system (see Supplementary Material 1 for site description and additional explanations of the model set-up as well as the final conclusions). The script details discipline-specific approaches to multilevel model fitting and parameterization by focusing on data collected through transect sampling. Each section of this Short Communication is represented in the R script (as separate steps) to facilitate translation of words into code. It is important to note that the R script is meant as a companion to the text and the carefully selected references and not vice versa. It offers an opportunity to use real-life agroforestry data and is expected to be modified by the reader as they go through the literature and the supplementary material at their own pace, trialing different approaches to modelling agroforestry systems.

Multilevel models in agroforestry research in temperate Europe

A review of 23 relevant publications (for details on methods and the query structure, see Supplementary Material 2) showed that recent (2019–2022) studies often focused on silvoarable systems, investigating a range of bio-physico-chemical parameters, i.e., soil organic carbon, microbial communities, and crop yield (Table 1). In 65% of all studies, samples were taken in point-transects with higher sampling density closer to the tree strips.

Table 1 Research papers published between 2019 and 2022 highlighting different methodological approaches employed to study agroforestry systems in temperate Europe.

In 57% of the studies, multiple similar agroforestry systems across a given region were investigated, thereby constituting true replications. The preferred form of experimental control (70% of the total) involved an adjacent non-agroforestry system. Schmidt et al. (2021) emphasized the importance of comparing soil texture among sampling locations, treating it as an indicator of comparability of soil conditions between the control and the agroforestry plots. However, only a few studies reported comparability of soil conditions (e.g., soil texture) as well as distance to the control plot. No reviewed study used pure tree stands as a control for tree strips.

Multiple statistical approaches were utilized for data analysis, ranging from machine learning (e.g., Wengert et al. 2021) to classical statistics such as ANOVA (e.g., Beule and Karlovsky 2021), linear regression (e.g., Markwitz et al. 2020), and non-parametric statistical tests (e.g., Beule et al. 2020). Multilevel models were employed in less than 50% of the publications with authors acknowledging the importance of accounting for spatial autocorrelation when using the point-transect design (e.g., Pardon et al. 2017). However, we noted a lack of clearly defined model structures in the published material as well as the need for uniform terminology, which would facilitate reproducibility and transferability.

Selection and classification of variables at different spatiotemporal scales

Spatial and temporal scales are important in agroforestry research because the effects of trees on and the interactions with their surrounding environment intensify as the system matures (Fig. 1; Step 1 of the R script: Selection and classification of variables). For example, the drivers of soil carbon storage range from soil physico-chemistry at the micro-scale to topography and soil texture at the local scale, to climate and vegetation cover at the global scale (Wiesmeier et al. 2019). Selecting variables at an appropriate scale will aid in managing the noisy agroforestry data characterized by uncertainties, which persist even for well-established processes such as carbon sequestration (Mayer et al. 2022).

Fig. 1
figure 1

The effects of space and time on the structure of agroforestry systems can be relevant to the selection of the most suitable statistical modelling approach. There is a compounding of major challenges in data analysis that might entail the use of different statistical tools at different life stages of the system. Abbreviations for exemplar modelling approaches: MM marginal model, GLMM generalized linear mixed model (for non-Gaussian distribution), LMM linear mixed model, GAMM generalized additive model

Important considerations prior to data analysis

Multilevel models are parametric statistical techniques and thus, the assumptions of homogeneity of variance and normal distribution of residuals should be addressed during the analysis (Step 2 of the R script: Data exploration and linear regression fit). In addition to these routinely performed preliminary checks, a detailed data exploration should be carried out following e.g., Zuur et al.’s (2010) 10-step protocol. This step provides an additional benefit of exploring the relationships between variables. Non-linear relationships can be captured by e.g., higher-degree polynomial LMMs [see Slaets et al. (2021) for examples] or GAMMs with smoothing functions [see Hellmann et al. (2017) for examples]. Finally, to take full advantage of the flexibility offered by the multilevel models, it might be beneficial to consider suitable data distributions (e.g., Gaussian for continuous, Poisson for count and density, or Gamma for continuous positive value-only data) (Bolker et al. 2009), especially in studies involving count data where overdispersion is a concern [see Young et al. (1999) and Harrison (2014) for examples].

In multilevel models, explanatory variables can be classified as fixed or random, depending on the research question, which makes it difficult to formulate universal definitions for fixed and random terms [see Harrison et al. (2018) for an introduction to the topic of fixed and random variables as well as to mixed effects modelling in R]. For example, an agroforestry system might have two tree planting schemes with a high and a low planting density in alternating tree strips, within the same field. The planting density of tree strips can be thought of as an agronomic intervention i.e., a treatment, and thus constitute a fixed effect (an explanatory variable). Alternatively, tree strips can be treated as a grouping variable, which should have a minimum of five levels to allow for variance estimation (Harrison et al. 2018), and thus, constitute random effects. In agronomic experiments, Piepho et al. (2003) recommended for each experimental unit to be represented by a random effect in the model, i.e., random effects can represent individual plots, which can also be applicable to agroforestry systems, provided multiple measurements per plot were collected.

Accounting for spatially correlated data

When the data is scarce e.g., in an early stage of establishment of an agroforestry experiment (and provided that the sampling method permits), it might be prudent to correct for spatial autocorrelation through the application of marginal models (Step 3 of the R script: Fitting a marginal model). A marginal model is a regression model with only fixed effects and is used to analyze correlated data i.e., when there is a known dependency among residuals. These models estimate population-level (marginal) parameters and involve group wise correlations in contrast to multilevel models that use individual points for the estimation of random effects [see Pekár and Brabec (2016) for detailed examples of both, spatial and temporal correlations].

However, when the sampling scheme involves a hierarchical structure with nesting e.g., points are sampled within a transect which are nested within tree strips, including random effects (the variance–covariance structure) might be preferable (Step 4 of the R script: Fitting LME: Selecting an error structure—it is recommended to investigate an interactive map of the research site starting in line 389 which demonstrates the spatial patterns within individual transects). Fitting the most complex variance–covariance structure constitutes a standard recommendation (Barr 2013) but significantly increases the data requirement, especially for random slope models (Harrison et al. 2018) [see Step 5 of the R script: Fitting LME: A random-slope and intercept model for an example of non-convergence]. Thus, simplifying complex structures, e.g., through the inspection of variance components, might aid the model selection process [see Crawley (2012) for specific examples and Zuur et al. (2009) for a more general introduction to a 10-step protocol of model selection].

Parametrization of the variance–covariance structure

When a variance–covariance structure is fitted (Step 6 of the R script: Fitting LME: A random-intercept model), there is an assumption of covariance i.e., observations associated with the random effects are correlated (Bates et al. 2015). For example, when fitting a random effect of transect, all measurements taken within the same transect are considered equally correlated. However, this is rarely a realistic assumption, as Tobler’s Law states that “Everything is related to everything, but near things are more related than distant things.” (Tobler 1970). We can expect a stronger correlation between the plot closest to the tree strip and the plot located midway to the center of the crop strip than with the plot in the center. Thus, the parameterization of the covariance structure (e.g., through fitting of appropriate spatial correlation functions) should be considered a part of the model fitting procedure as described in Knörzer et al. (2010) and Slaets et al. (2021) (Step 7 of the R script: Fitting LME: Error structure parameterization). It is important to note that parametrization of the covariance structure precludes the use of summaries of the model fit i.e., computing of marginal and conditional R2 values (Nakagawa and Schielzeth 2013; Stoffel et al. 2021) because it ignores the spatiotemporal correlation and heterogeneity of variance [see Piepho (2019) for an alternative approach].

Improving quality assurance in European agroforestry research

As the interest in modern agroforestry systems continues to rise, it is essential to consider the different ways in which meaningful and appropriate data analysis can be carried out to inform agricultural practitioners and policy makers. This Short Communication highlights the potential of multilevel models and acts as a starting point for scientists who are considering applying multilevel models in an agroforestry context but are still familiarizing themselves with the key literature, terminology, and R syntax. Based on the absence of model diagnostics in the reviewed agroforestry literature, we recommend that future studies include testing of model adequacy through e.g., plotting standardized residuals against fitted values, against each covariate in the model and against each covariate not in the model, as well as using autocorrelation functions and/or variograms to assess independence of residuals when the data includes temporal or spatial aspects (Zuur and Ieno 2016; Step 8 of the R script: Model comparison and model diagnostics and Step 9: Avoiding hasty conclusions with visualizations). Making these diagnostic plots available alongside clearly defined model structures [see e.g., Beyer et al. (2022) for exemplar GLMM descriptions and Figure S3A-E in Supplementary Material 1 for exemplar diagnostic plots] will ensure higher transparency and confidence in the experimental results. This will safeguard agroforestry research from possible model misuse identified in other fields of research (discussed in e.g., Bolker et al. 2009; Dickey-Collas et al. 2014; Zurell et al. 2020).