Skip to main content
Log in

GIS-FA: an approach to integrating thematic maps, factor-analytic, and envirotyping for cultivar targeting

  • Original Article
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

We propose an “enviromics” prediction model for recommending cultivars based on thematic maps aimed at decision-makers.

Abstract

Parsimonious methods that capture genotype-by-environment interaction (GEI) in multi-environment trials (MET) are important in breeding programs. Understanding the causes and factors of GEI allows the utilization of genotype adaptations in the target population of environments through environmental features and factor-analytic (FA) models. Here, we present a novel predictive breeding approach called GIS-FA, which integrates geographic information systems (GIS) techniques, FA models, partial least squares (PLS) regression, and enviromics to predict phenotypic performance in untested environments. The GIS-FA approach enables: (i) the prediction of the phenotypic performance of tested genotypes in untested environments, (ii) the selection of the best-ranking genotypes based on their overall performance and stability using the FA selection tools, and (iii) the creation of thematic maps showing overall or pairwise performance and stability for decision-making. We exemplify the usage of the GIS-FA approach using two datasets of rice [Oryza sativa (L.)] and soybean [Glycine max (L.) Merr.] in MET spread over tropical areas. In summary, our novel predictive method allows the identification of new breeding scenarios by pinpointing groups of environments where genotypes demonstrate superior predicted performance. It also facilitates and optimizes cultivar recommendations by utilizing thematic maps.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The R codes and both datasets used in this study are freely available: https://github.com/Kaio-Olimpio/GIS-FA. Supplementary Material contains a detailed tutorial with a commented script describing the steps for performing GIS-FA analysis with the soybean dataset.

References

Download references

Acknowledgements

This work was supported by the Minas Gerais State Agency for Research and Development (FAPEMIG), the Brazilian National Council for Scientific and Technological Development (CNPq), Coordination for the Improvement of Higher Education Personnel (CAPES), the Mato Grosso do Sul Foundation (Fundação MS), the Brazilian Agricultural Research Corporation (Embrapa Rice and Beans), and the Federal University of Viçosa (UFV).

Funding

This research was supported by the Minas Gerais State Agency for Research and Development (FAPEMIG), the Coordination for the Improvement of Higher Education Personnel (CAPES), and the Brazilian National Council for Scientific and Technological Development (CNPq). Fundação de Amparo à Pesquisa do Estado de Minas Gerais. Coordenação de Aperfeiçoamento de Pessoal de Nível Superior. Conselho Nacional de Desenvolvimento Científico e Tecnológico.

Author information

Authors and Affiliations

Authors

Contributions

M.S.A., S.F.S.C., and K.O.G.D. conceived the research. M.S.A. and S.F.S.C. executed the statistical analyses and drafted the initial manuscript. M.D.K. and G.C.N. provided insights into the methodology. L.A.S.D., F.M.F., G.R.P., R.S.A., P.C.S.C., M.D.K., and G.C.N. provided critical revisions of the paper drafts. A.R.G.B. provided knowledge on the structure of the soybean dataset, while A.B.H. and F.B. provided information about the rice dataset. M.S.A., S.F.S.C., and M.D.K. built the tutorial available in the Supplementary Material. All authors approved the final version of the manuscript.

Corresponding author

Correspondence to Kaio O. G. Dias.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Communicated by Hiroyoshi Iwata.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Partial least squares regression

Appendix A: Partial least squares regression

Here, we employed the kernel PLS algorithm (Lindgren et al. 1993; Dayal and MacGregor 1997) to predict the factor loadings of untested environments. Details about this algorithm are presented below:

Take the following multiple regressions as a starting point:

$$\begin{aligned} \hat{\boldsymbol{\Lambda }}^\star = \textbf{W} \textbf{B} + \textbf{E} \end{aligned}$$
(A1)

where \(\hat{\boldsymbol{\Lambda }}^\star\) is the \(J \times K\) matrix of K rotated loadings for the J observed environments, \(\textbf{W}\) is a \(J \times P\) matrix of scaled values for P environmental features in the J observed environments, \(\textbf{B}\) is a \(P \times K\) vector of coefficients, and \(\textbf{E}\) is a \(J \times K\) matrix of lack of fit effects. Note that most of the environmental features are correlated (Supplementary Figure 4), so \(\textbf{W}\) has multicollinearity problems, and \(\textbf{B} = (\textbf{W}^\prime \textbf{W})^{-1} \textbf{W}^\prime \hat{\boldsymbol{\Lambda }}^\star\) does not yield a proper solution. To overcome this issue, we employed kernel PLS regression to transform \(\textbf{B}\) into \(\textbf{B}^*\), using the following equation:

$$\begin{aligned} \textbf{B}^\star = \boldsymbol{\Phi } (\boldsymbol{\Theta }^\prime \boldsymbol{\Phi })^{-1} \boldsymbol{\Xi }^\prime \end{aligned}$$
(A2)

where \(\boldsymbol{\Phi }\) is a \(P \times C\) matrix of weights for \(\textbf{W}\) (\(\boldsymbol{\Phi } = \{\boldsymbol{\phi }_1 \, \boldsymbol{\phi }_2 \, \ldots \boldsymbol{\phi }_C \}\)), with C being the number of PLS components; \(\boldsymbol{\Theta }\) is a matrix of loadings for \(\textbf{W}\) (\(\boldsymbol{\Theta } = \{\boldsymbol{\theta }_1 \, \boldsymbol{\theta }_2 \, \ldots \boldsymbol{\theta }_C \}\)) and has the same dimension as \(\boldsymbol{\Phi }\), and \(\boldsymbol{\Xi }\) is a \(K \times C\) matrix of weights for \(\boldsymbol{\Lambda }\) (\(\boldsymbol{\Xi } = \{\boldsymbol{\xi }_1 \boldsymbol{\xi }_2 \ldots \boldsymbol{\xi }_C\}\)). We describe the CV procedure that defined the number of components (\(c = 1, 2, \ldots , C\)) in section Spatial predictions in the breeding zone. \(\boldsymbol{\Phi }\), \(\boldsymbol{\Theta }\), and \(\boldsymbol{\Xi }\) were defined using an iterative process that leveraged the kernel functions of \(\textbf{W}\) and \(\boldsymbol{\Lambda }\). First, \(\boldsymbol{\phi }_c\) is estimated as the eigenvector that is equivalent to the largest eigenvalue of the kernel \(\textbf{W}^\prime \hat{\boldsymbol{\Lambda }}^\star \hat{\boldsymbol{\Lambda }}^{\star ^\prime } \textbf{W}\). We used this vector to initialize an iterative process whose number of repetitions is equivalent to C. Let \(\textbf{R} = \boldsymbol{\Phi } (\boldsymbol{\Theta }^\prime \boldsymbol{\Phi })^{-1}\), with \(\textbf{R} = \{\textbf{r}_1 \; \textbf{r}_2 \; \dots \; \textbf{r}_C\}\). In the first iteration, \(\textbf{r}_1 = \boldsymbol{\phi }_1\). Subsequently, \(\textbf{r}_c = \boldsymbol{\phi }_c - \boldsymbol{\theta }_{c-1}^\prime \boldsymbol{\phi }_c \boldsymbol{\xi }_{c-1}^\prime\). On each iteration, \(\boldsymbol{\theta }_c\) and \(\boldsymbol{\xi }_c\) are estimated as follows:

$$\begin{aligned} \boldsymbol{\theta }_c = \frac{ \textbf{r}_c^\prime (\textbf{W}^\prime \textbf{W}) }{ \textbf{r}_c^\prime (\textbf{W}^\prime \textbf{W}) \textbf{r}_c } \quad \boldsymbol{\xi }_c = \frac{ \textbf{r}_c^\prime (\textbf{W}^\prime \hat{\boldsymbol{\Lambda }}^\star ) }{ \textbf{r}_c^\prime (\textbf{W}^\prime \textbf{W}) \textbf{r}_c } \end{aligned}$$
(A3)

The solutions of these equations are stored in \(\boldsymbol{\Theta }\) and \(\boldsymbol{\Xi }\), respectively, and are used to update the covariance matrix for the next iteration as follows:

$$\begin{aligned} (\textbf{W}^\prime \hat{\boldsymbol{\Lambda }}^\star )_{c+1} = (\textbf{W}^\prime \hat{\boldsymbol{\Lambda }}^\star )_{c} - \boldsymbol{\theta }_c \boldsymbol{\xi }_c^\prime [(\textbf{W} \textbf{r}_c)^\prime \textbf{W} \textbf{r}_c] \end{aligned}$$
(A4)

When the iteration process is finished, \(\textbf{B}^*\) provides a proper solution to Eq. (A1) and can be used for prediction purposes. We used \(\textbf{B}^*\) in Eq. (17) to train the PLS model and in Eq. (18) to make predictions.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Araújo, M.S., Chaves, S.F.S., Dias, L.A.S. et al. GIS-FA: an approach to integrating thematic maps, factor-analytic, and envirotyping for cultivar targeting. Theor Appl Genet 137, 80 (2024). https://doi.org/10.1007/s00122-024-04579-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00122-024-04579-z

Navigation