Agriculture is a vibrant field of science, requiring up-to-date statistical methodology both at the design and analysis stage of experiments and studies. This special issue brings together recent developments in the field. There are a total of twelve papers, seven of which focus primarily on design and five on analysis. This balance between design and analysis is intentional as we believe both are equally important. Whilst we cover agriculture quite broadly, there are seven papers focused on plant breeding and variety testing, giving this area some prevalence; most of these papers also have a focus on design.

A natural starting point for design of experiments in agriculture is the excellent review by Verdooren (2020) who tracks development from ancient times through to the advent of Sir R. A. Fisher in 1926, which represents the start of modern statistical design. The paper then highlights some important advances in the design of agricultural experiments over the ensuing nearly 100 years. Included is the discussion of various designs with incomplete blocking structures, factorial and spatial designs and software for design generation. Among the other papers with an emphasis on design, five deal with designs that are most commonly used in plant breeding and variety testing, even though the concepts and methods are clearly more broadly applicable. Bailey et al. (2020) construct efficient alternatives to the non-existent square lattice designs for 36 varieties and for up to eight replicates and study their properties. Whilst square lattice designs provide an optimal arrangement of \({v}^{{2}}\) varieties in resolvable incomplete blocks of size v, for \(v=6\) such designs cannot be constructed when there are more than three replicates. Designs are produced using both mathematical theory and design generation software. Edmondson (2020) proposes an interesting extension of classical blocked designs that allows multiple nested levels of blocking, thus addressing the common challenge that optimal block size is seldom known exactly beforehand. Several analysis strategies are also considered, including smoothing methods based on generalized additive models. Hoefler et al. (2020) describe a large-scale simulation study to investigate the performance of spatial designs relative to designs with more traditional blocking structures such as alpha, row-column and partially replicated designs. Designs are analysed and compared across multiple locations. The results provide important information for researchers when designing field experiments. Two further papers consider designs when population structure and genetic relatedness among breeding lines are taken into account. Whereas classical design approaches are based on linear models with fixed treatment effects, these two papers model treatments (genotypes) as random. Cullis et al. (2020) propose a novel approach to optimize design for a given set of genotypes in early-stage plant breeding trials when genetic relatedness is modeled using a marker-based kinship matrix. The authors also provide freely available software to generate designs using their proposed methods, which should be of broad interest to plant breeders and agronomists. Heslot and Feoktistov (2020) tackle the problem of selecting a subset of genotypes for phenotyping when the objective is to do genomic prediction for a larger related set of genotypes. Design optimization is done using an evolutionary algorithm. This key problem in the optimization of breeding programmes is solved for three cases, i.e. (i) optimization of selective phenotyping of available individuals, (ii) optimization of hybrid testing and (iii) optimization of designs for genetically connected crosses. Finally, Huang et al. (2020) explore strategies for the design of experiments to estimate nonlinear regression models suitable to assess extended Michaelis–Menten kinetics in biochemical reactions involving enzymes and substrates. The authors focus on applications with several controllable inputs and consider optimal designs based on multifactor hybrid nonlinear models, which are computationally rather challenging. Among other things, they study a compound design criterion for discriminating between two candidate models, which they recommend for design of advanced kinetic studies.

Structural equation models (SEM) have been increasingly utilized for data analyses involving agricultural production systems to infer causality or directional relationships between economically important traits. However, the typical assumption of SEM is that these relationships are uniform across production environments. Chitakasempornkul et al. (2020) propose a Bayesian model to allow for heterogeneity in these relationships between six economically important reproduction traits in a swine production system. They demonstrate that these heterogeneous specifications provide substantial improvements in model fit in their application relative to the typical SEM specifications. A broad class of models popular among agricultural statisticians are generalized linear mixed models (GLMM). These are routinely used to analyse many different observational studies and designed experiments involving discrete outcomes. A predominant consensus that has emerged has been that integral approximation methods (e.g. Gauss–Hermite quadrature) should be generally preferred to linearization based (e.g. pseudo-likelihood) methods for estimating variance components in GLMM. Using extensive simulation representative of agricultural experimental designs, Stroup and Claassen (2020) demonstrate the folly of that general recommendation and provide some guidelines in making a suitable choice between the two GLMM estimation strategies. Lewis-Beck et al. (2020) look at the use of remote sensing data for monitoring crop phenology and development within and across seasons. They nicely illustrate the use of a functional data approach in a spatio-temporal setting, accounting for spatial dependence between locations through the functional curve coefficients. Modeling across multiple growing years, and including growing degree days as a covariate, the authors estimate the timing for when crops reach their peak each season in the US corn belt. The remaining two papers deal with the analysis of field trials using spatial methods, a field of application that also featured prominently in several of the design-related papers. Boer et al. (2020) explore the links between three popular models for spatial analysis, i.e. the linear variance model, the random walk model and P-splines with first-differences penalties. They discuss in which settings these three models are equivalent and when they differ. The comparison provides new perspectives for further developments based on P-splines. Continuing on the same theme, Mao et al. (2020) propose a strategy for accurate estimation of genetic effects for genomic prediction in plant breeding populations whilst accounting for spatial (co)variability within fields. They additionally model subpopulation effects, not fully accounted for by genetic markers, as a third source of (co)variability. Their Gaussian random field model jointly accounts for these three sources of (co)variability using Gaussian kernel specifications. They demonstrate the competitiveness of their proposed method with other recently developed methods on publicly available datasets involving comparisons of maize and wheat varieties. We note that the latter two papers consider first differences among neighbouring plots in their spatial variance–covariance structures, but in different ways as regards the (i) estimation or definition of variance parameters, (ii) handling the singularity in the associated precision matrix, and whether the matrix related to first differences enters the variance matrix assumed for the data or the precision matrix (generalized inverse of the variance matrix).

In summary, the manuscripts provide excellent examples of the importance of statistical research related to applications in the agricultural sciences. It goes without saying that this special issue can provide but a very small glimpse of the diverse and interesting opportunities and challenges that agriculture provides for statisticians. We are hopeful that the exciting work presented in this special issue spawns further contributions to the development and adaptation of statistical methodology for agricultural research.