Abstract
Genomic prediction in Coffee breeding has shown good potential in predictive ability (PA), genetic gains and reduction of the selection cycle time. It is known that the cost of genotyping was prohibitive for many species, and their value is associated with the density markers panel used. The use of optimize marker density panel may reduce the genotyping cost and improve the PA. We aimed to evaluate the trade-off between density marker panels size and the PA for eight agronomic traits in Coffea canephora using machine learning algorithms. These approaches were compared with BLASSO method. The used data consisted of 165 genotypes of C. canephora genotyped with 14,387 SNP markers. The plants were phenotyped for vegetative vigor (Vig), rust (Rus) and cercosporiose incidence (Cer), fruit maturation time (Mat), fruit size (FS), plant height (PH), diameter of the canopy projection (DC) and yield (Y). Twelve different density marker panels were used. The common trend observed in the analysis shows an increase of the PA as the number of markers decreases, having a peak when used between 500 and 1,000 markers. Comparing the best and the worse results (full SNP panel density) for each trait, some had an improvement around of 100% (PH: 0.35–0.77; Cer: 0.40–0.84; DC: 0.39–0.82; Rus: 0.39–0.83, Vig: 0.40–0.77), the other showed an improvement more than 340% (Mat: 0.12–0.60; Y: 0.14–0.61; FS: 0.07–0.60). The results of the current study indicate that the reduction of the number of markers can improve the selection of individuals at a lower cost.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Alkimim ER, Caixeta ET, Sousa TV, Resende MDV, da Silva FL, Sakiyama NS et al (2020) Selective efficiency of genome-wide selection in Coffea canephora breeding. Tree Genet Gen 16:1–11. https://doi.org/10.1007/S11295-020-01433-3
Arcanjo ES, Nascimento ACC, Nascimento M, Azevedo CF, Caixeta ET, Oliveira ACC, Pereira AA (2024) Low-density marker panels for genomic prediction in Coffea arabica L. Acta Scientiarum. Agronomy, Brazil
Barbosa IP, Silva MJ, Cosra WG, Sant’Anna IC, Nascimento M, Cruz CD (2021) Genome-enabled prediction through machine learning methods considering different levels of trait complexity. Crop Sci 61:1890–1902. https://doi.org/10.1002/csc2.20488
Cohen J (1960) A coefficient of agreement for nominal scales 1. Educ Pshychol Meas 20:37–46
de Resende MDV (2016) Software Selegen-REML/BLUP: a useful tool for plant breeding. Crop Breed Appl Biotechnol 16:330–339. https://doi.org/10.1590/1984-70332016V16N4A49
de Sousa IC, Nascimento M, Silva GN, Nascimento ACC, Cruz CD, Silva FFE et al (2020) Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms. Scientia Agricola 78:1–8. https://doi.org/10.1590/1678-992X-2020-0021
de Sousa IC, Nascimento N, Silva GN, Nascimento ACC, Cruz CD, Silva FF et al (2021) Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms. Sci Agric 78:e20200021. https://doi.org/10.1590/1678-992X-2020-0021
Evans JD (1996) Straightforward statistics for the behavioral sciences. Thomson Brooks/Cole Publishing Co., Pacific Grove
Fanelli FC, Galli G, Ferrão LFV, Nonato JVA, Padilha L, Maluf MP, Resende MFR Jr, Filho OG, Fritsche-Neto RH (2020) The effect of bienniality on genomic prediction of yield in arabica coffee. Euphytica 216:101. https://doi.org/10.1007/s10681-020-02641-7
Ferrão LFV, Ferrão RG, Ferrão MAG, Fonseca A, Carbonetto P, Stephens M et al (2018) Accurate genomic prediction of Coffea canephora in multiple environments using whole-genome statistical models. Heredity 122:261–275. https://doi.org/10.1038/s41437-018-0105-y
Garrick DJ, Taylor JF, Fernando RL (2009) Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Select Evol 41:1–8. https://doi.org/10.1186/1297-9686-41-55/TABLES/1
Geweke J (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Federal Reserve Bank of Minneapolis, Minneapolis
Gianola D, Okut H, Weigel KA, Rosa GJM (2011) Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genet 12:87. https://doi.org/10.1186/1471-2156-12-87
González-Camacho JM, De CamposPérez GP, Gianola D, Cairns JE, Mahuku G et al (2012) Genome-enabled prediction of genetic values using radial basis function neural networks. Theor Appl Genet 125:759. https://doi.org/10.1007/S00122-012-1868-9
Grossi DA, Brito LF, Jafarikia M, Schenkel FS, Feng Z (2018) Genotype imputation from various low-density SNP panels and its impact on accuracy of genomic breeding values in pigs. Animal an Int J Animal Biosci 12:2235–2245. https://doi.org/10.1017/S175173111800085X
Habier D, Fernando RL, Dekkers JCM (2009) Genomic selection using-low density marker panels. Genet 182:343–353. https://doi.org/10.1534/genetics.108.100289
Happ MM, Wang H, Graef GL, Hyten DL (2019) Generating high density, low cost genotype data in Soybean [Glycine max (L.) Merr.]. G3 Genes Gen Genet 9:2153–2160. https://doi.org/10.1534/G3.119.400093
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction 2009. Springer, New York
International Coffee Organization [ICO] (2021) Coffee development report. https://5aa6088a-da13-41c1-b8ad-b2244f737dfa.filesusr.com/ugd/38d76b_4fc7b54a15f14a548b2f4a208c2eae6d.pdf. Accessed 16 January 2023
James G, Witten D, Hastie T, Tibshirani R, Taylor J (2023) An introduction to statistical learning: with applications in Python. Springer International Publishing, Cham
Kriaridou C, Tsairidou S, Houston RD, Robledo D (2020) Genomic prediction using low density marker panels in aquaculture: performance across species, traits, and genotyping platforms. Front Genet 11:124. https://doi.org/10.3389/FGENE.2020.00124
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159. https://doi.org/10.2307/2529310
Li B, Zhang N, Wang YG, George AW, Reverter A, Li Y (2018) Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods. Front Genet 9:237. https://doi.org/10.3389/FGENE.2018.00237/BIBTEX
Liang M, Chang T, An B, Xinghai D, Du L, Wang X (2021) A stacking ensemble learning framework for genomic prediction. Front Genet 12:600040. https://doi.org/10.3389/fgene.2021.600040
Liaw A, Wiener M (2007) Classification and regression by random forest. R News 2:18–22
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829. https://doi.org/10.1093/GENETICS/157.4.1819
Ogawa S, Matsuda H, Taniguchi Y, Watanabe T, Nishimura S, Sugimoto Y et al (2014) Effects of single nucleotide polymorphism marker density on degree of genetic variance explained and genomic evaluation for carcass traits in Japanese Black beef cattle. BMC Genet 15:1–13. https://doi.org/10.1186/1471-2156-15-15/FIGURES/6
Oliveira HR, Brito LF, Silva FF, Lourenco DAL, Jamrozik J, Schenkel FS (2019) Genomic prediction of lactation curves for milk, fat, protein, and somatic cell score in Holstein cattle. J Dairy Sci 102:452–463. https://doi.org/10.3168/JDS.2018-15159
Oliveira GF, Nascimento ACC, Nascimento M, de Castro Sant’Anna I, Romero JV, Azevedo CF et al (2021) Quantile regression in genomic selection for oligogenic traits in autogamous plants: a simulation study. PLoS ONE 16:e0243666. https://doi.org/10.1371/JOURNAL.PONE.0243666
Pearson K (1895) VII. Note on regression and inheritance in the case of two parents. Proc R Soc London 58:240–242. https://doi.org/10.1098/RSPL.1895.0041
Pérez P, de Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198:483–495. https://doi.org/10.1534/GENETICS.114.164442/-/DC1
R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2023. Available: https://www.r-project.org/
Resende MDV, Silva FF, Azevedo CF (2014) Estatística Matemática. Biométrica e Computacional, Viçosa
Senthilvel S, Ghosh A, Shaik M, Shaw RK, Bagali PG (2019) Development and validation of an SNP genotyping array and construction of a high-density linkage map in castor. Sci Rep 9(1):1–10. https://doi.org/10.1038/s41598-019-39967-9
Silveira LS, Lima LP, Nascimento M, Nascimento ACC, Silva FF (2020) Regression trees in genomic selection for carcass traits in pigs. Genet Molec Res 19(1):gmr18498. https://doi.org/10.4238/GMR18498
Sousa TV, Caixeta ET, Alkimim ER, Oliveira ACB, Pereira AA, Sakiyama NS et al (2019) Early selection enabled by the implementation of genomic selection in Coffea arabica breeding. Front Plant Sci 9:1934. https://doi.org/10.3389/FPLS.2018.01934/BIBTEX
Sousa IC, Nascimento M, Sant’Anna IC, Caixerta ET, Azevedo CF, Cruz CD et al (2022) Marker effects and heritability estimates using additive-dominance genomic architectures via artificial neural networks in Coffea canephora. PLoS ONE 17:e0262055. https://doi.org/10.1371/journal.pone.0262055
Tsairidou S, Hamilton A, Robledo D, Bron JE, Houston RD (2020) Optimizing low-cost genotyping and imputation strategies for genomic selection in atlantic salmon. G3 Genes Genomes Genet 10:581–590. https://doi.org/10.1534/G3.119.400800
Wellmann R, PreuB S, Tholen E, Heinkel J, Wimmers K, Bennewitz J (2013) Genomic selection using low density marker panels with application to a sire line in pigs. Genet Sel Evol 45:1. https://doi.org/10.1186/1297-9686-45-28
Wray NR, Yang J, Hayes BJ, Price AL, Goddard ME, Visscher PM (2013) Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14(7):507–515. https://doi.org/10.1038/nrg3457
Funding
This work was financially supported by the Brazilian Coffee Research and Development Consortium (CBP&D/Café), the National Institute of Science and Technology of Coffee (INCT-Café), the Foundation for Research Support of the State of Minas Gerais (FAPEMIG), the National Council of Scientific and Technological Development (CNPq), and the Coordination for the Improvement of Higher Education Personnel (CAPES)—Finance code 001. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization: M.N., I.C.S. and C.A.V.B.; Data curation: E.T.C. and E.R.A; Formal analysis: I.C.S., C.A.V.B. and M.N.; Funding acquisition: E.T.C and M.N.;Investigation: I.C.S., C.A.V.B, E.T.C, A.C.C.N., C.F.A and M.N.; Methodology: M.N. and I.C.S; Supervision: M.N., E.T.C, A.C.C.N. and C.F.A; Validation: I.C.S. and C.A.V.B.; Visualization: I.C.S., C.A.V.B, E.T.C, A.C.C.N., C.F.A, E.R.A. and M.N.; Writing—original draft: EI.C.S. and C.A.V.B; Writing—review & editing: M.N., E.T.C, A.C.C.N. and C.F.A
Corresponding author
Ethics declarations
Conflict of interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
de Sousa, I.C., Barreto, C.A.V., Caixeta, E.T. et al. The trade-off between density marker panels size and predictive ability of genomic prediction for agronomic traits in Coffea canephora. Euphytica 220, 46 (2024). https://doi.org/10.1007/s10681-024-03303-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10681-024-03303-8