Skip to main content
Log in

Population-tailored mock genome enables genomic studies in species without a reference genome

  • Original Article
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Based on molecular markers, genomic prediction enables us to speed up breeding schemes and increase the response to selection. There are several high-throughput genotyping platforms able to deliver thousands of molecular markers for genomic study purposes. However, even though its widely applied in plant breeding, species without a reference genome cannot fully benefit from genomic tools and modern breeding schemes. We used a method to assemble a population-tailored mock genome to call single-nucleotide polymorphism (SNP) markers without an available reference genome, and for the first time, we compared the results with standard genotyping platforms (array and genotyping-by-sequencing (GBS) using a reference genome) for performance in genomic prediction models. Our results indicate that using a population-tailored mock genome to call SNP delivers reliable estimates for the genomic relationship between genotypes. Furthermore, genomic prediction estimates were comparable to standard approaches, especially when considering only additive effects. However, mock genomes were slightly worse than arrays at predicting traits influenced by dominance effects, but still performed as well as standard GBS methods that use a reference genome. Nevertheless, the array-based SNP markers methods achieved the best predictive ability and reliability to estimate variance components. Overall, the mock genomes can be a worthy alternative for genomic selection studies, especially for those species where the reference genome is not available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The datasets are available in the link: https://data.mendeley.com/datasets/4nccgtcpgn/1.

Code availability

The code is available in the link: https://data.mendeley.com/datasets/4nccgtcpgn/1.

References

  • Abdollahi-Arpanahi R, Nejati-Javaremi A, Pakdel A, Moradi-Shahrbabak M, Morota G, Valente BD et al (2014) Effect of allele frequencies, effect sizes and number of markers on prediction of quantitative traits in chickens. J Anim Breed Genet 131:123–133

    CAS  PubMed  Google Scholar 

  • Alves FC, Granato ÍSC, Galli G, Lyra DH, Fritsche-Neto R, De Los CG (2019) Bayesian analysis and prediction of hybrid performance. Plant Methods 15:1–18

    Google Scholar 

  • Armstead I, Huang L, Ravagnani A, Robson P, Ougham H (2009) Bioinformatics in the orphan crops. Brief Bioinform 10:645–653

    CAS  PubMed  Google Scholar 

  • Baldermann S, Blagojević L, Frede K, Klopsch R, Neugart S, Neumann A et al (2016) Are neglected plants the food for the future? Crit Rev Plant Sci 35:106–119

    CAS  Google Scholar 

  • Beissinger TM, Hirsch CN, Sekhon RS, Foerster JM, Johnson JM, Muttoni G et al (2013) Marker density and read depth for genotyping populations using genotyping-by-sequencing. Genetics 193:1073–1081

    CAS  PubMed  PubMed Central  Google Scholar 

  • Browning BL, Zhou Y, Browning SR (2018) A one-penny imputed genome from next-generation reference panels. Am J Hum Genet 103:338–348

    CAS  PubMed  PubMed Central  Google Scholar 

  • Butler D, Cullis BR, Gilmour AR, Gogel BJ, Thompson R (2018) ASReml-R reference manual version 4. VSN International Ltd, UK, p 176

    Google Scholar 

  • Cao S, Loladze A, Yuan Y, Wu Y, Zhang A, Chen J et al (2017) Genome-wide analysis of tar spot complex resistance in maize using genotyping-by-sequencing SNPs and whole-genome prediction. Plant Genome. https://doi.org/10.3835/plantgenome2016.10.0099

    Article  PubMed  Google Scholar 

  • Chang Y, Liu H, Liu M, Liao X, Sahu SK, Fu Y et al (2018) The draft genomes of five agriculturally important African orphan crops. Gigascience 8:1–16

    Google Scholar 

  • Chen CY, Misztal I, Aguilar I, Legarra A, Muir WM (2011) Effect of different genomic relationship matrices on accuracy and scale. J Anim Sci 89:2673–2679

    CAS  PubMed  Google Scholar 

  • Chu J, Zhao Y, Beier S, Schulthess AW, Stein N, Philipp N et al (2020) Suitability of single-nucleotide polymorphism arrays versus genotyping-by-sequencing for Genebank genomics in wheat. Front Plant Sci 11:1–12

    Google Scholar 

  • Combs E, Bernardo R (2013) Accuracy of genomewide selection for different traits with constant population size, heritability, and number of markers. Plant Genome. https://doi.org/10.3835/plantgenome2012.11.0030

    Article  Google Scholar 

  • Covarrubias-Pazaran G (2016) Genome-assisted prediction of quantitative traits using the r package sommer. PLoS ONE 11:1–15

    Google Scholar 

  • Crossa J, Beyene Y, Semagn K, Pérez P, Hickey JM, Chen C et al (2013) Genomic prediction in maize breeding populations with genotyping-by-sequencing. G3 Genes, Genomes, Genet 3:1903–1926

    Google Scholar 

  • Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, de los Campos G et al (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22:961–975

    CAS  PubMed  Google Scholar 

  • Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031

    CAS  PubMed  PubMed Central  Google Scholar 

  • Darrier B, Russell J, Milner SG, Hedley PE, Shaw PD, Macaulay M et al (2019) A comparison of mainstream genotyping platforms for the evaluation and use of barley genetic resources. Front Plant Sci 10:1–14

    Google Scholar 

  • de Freitas Mendonça L, Granato ÍSC, Alves FC, Morais PPP, Vidotti MS, Fritsche-Neto R (2017) Accuracy and simultaneous selection gains for N-stress tolerance and N-use efficiency in maize tropical lines. Sci Agric 74:481–488

    Google Scholar 

  • de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345

    PubMed  Google Scholar 

  • Dou J, Zhao X, Fu X, Jiao W, Wang N, Zhang L et al (2012) Reference-free SNP calling: improved accuracy by preventing incorrect calls from repetitive genomic regions. Biol Direct 7:1–9

    Google Scholar 

  • Elbasyoni IS, Lorenz AJ, Guttieri M, Frels K, Baenziger PS, Poland J et al (2018) A comparison between genotyping-by-sequencing and array-based scoring of SNPs for genomic prediction accuracy in winter wheat. Plant Sci 270:123–130

    CAS  PubMed  Google Scholar 

  • Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES et al (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:1–10

    Google Scholar 

  • Enciso-Rodríguez FE, Osorio-Guarín JA, Garzón-Martínez GA, Delgadillo-Duran P, Barrero LS (2020) Optimization of the genotyping-by-sequencing SNP calling for diversity analysis in cape gooseberry (Physalis peruviana L.) and related taxa. PLoS ONE 15:1–18

    Google Scholar 

  • Fischer S, Möhring J, Schön CC, Piepho HP, Klein D, Schipprack W et al (2008) Trends in genetic variance components during 30 years of hybrid maize breeding at the University of Hohenheim. Plant Breed 127:446–451

    Google Scholar 

  • Frascaroli E, Schrag TA, Melchinger AE (2013) Genetic diversity analysis of elite European maize (Zea mays L.) inbred lines using AFLP, SSR, and SNP markers reveals ascertainment bias for a subset of SNPs. Theor Appl Genet 126:133–141

    PubMed  Google Scholar 

  • Galli G, Alves FC, Morosini JS, Fritsche-Neto R (2020) On the usefulness of parental lines GWAS for predicting low heritability traits in tropical maize hybrids (M Causse, Ed.). PLoS ONE 15:e0228724

    CAS  PubMed  PubMed Central  Google Scholar 

  • Ganal MW, Durstewitz G, Polley A, Bérard A, Buckler ES, Charcosset A et al (2011) A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS ONE 6:e28334

    CAS  PubMed  PubMed Central  Google Scholar 

  • Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q et al (2014) TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE 9:e90346

    PubMed  PubMed Central  Google Scholar 

  • Granato ISC, Galli G, de Oliveira Couto EG, e Souza MB, Mendonca LF, Fritsche-Neto R (2018) snpReady: a tool to assist breeders in genomic analysis. Mol Breed. https://doi.org/10.1007/s11032-018-0844-8

    Article  Google Scholar 

  • Gupta PK, Rustgi S, Mir RR (2008) Array-based high-throughput DNA markers for crop improvement. Heredity 101:5–18

    CAS  PubMed  Google Scholar 

  • Hallauer AR, Carena MJ, Filho JBM (2010) Quantitative genetics in maize breeding. Springer, New York

    Google Scholar 

  • He S, Schulthess AW, Mirdita V, Zhao Y, Korzun V, Bothe R et al (2016) Genomic selection in a commercial winter wheat population. Theor Appl Genet 129:641–651

    CAS  PubMed  Google Scholar 

  • Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12

    CAS  Google Scholar 

  • Hendre PS, Muthemba S, Kariba R, Muchugi A, Fu Y, Chang Y et al (2019) African Orphan Crops Consortium (AOCC): status of developing genomic resources for African orphan crops. Planta 250:989–1003

    CAS  PubMed  Google Scholar 

  • Heslot N, Rutkoski J, Poland J, Jannink JL, Sorrells ME (2013) Impact of marker ascertainment bias on genomic selection accuracy and estimates of genetic diversity. PLoS ONE 8:e74612

    CAS  PubMed  PubMed Central  Google Scholar 

  • Hirsch CN, Foerster JM, Johnson JM, Sekhon RS, Muttoni G, Vaillancourt B et al (2014) Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26:121–135

    CAS  PubMed  PubMed Central  Google Scholar 

  • Holland JB (2007) Genetic architecture of complex traits in plants. Curr Opin Plant Biol 10:156–161

    CAS  PubMed  Google Scholar 

  • Islam MS, Fang DD, Jenkins JN, Guo J, McCarty JC, Jones DC (2020) Evaluation of genomic selection methods for predicting fiber quality traits in Upland cotton. Mol Genet Genomics 295:67–79

    CAS  PubMed  Google Scholar 

  • Jannink JL (2010) Dynamics of long-term genomic selection. Genet Sel Evol 42:1–11

    Google Scholar 

  • Kang YJ, Lee T, Lee J, Shim S, Jeong H, Satyawan D et al (2016) Translational genomics for plant breeding with the genome sequence explosion. Plant Biotechnol J 14:1057–1069

    CAS  PubMed  Google Scholar 

  • Lettre G (2011) Recent progress in the study of the genetics of height. Hum Genet 129:465–472

    PubMed  Google Scholar 

  • Liu C, Sukumaran S, Jarquin D, Crossa J, Dreisigacker S, Sansaloni C et al (2020) Comparison of array- and sequencing-based markers for genome-wide association mapping and genomic prediction in spring wheat. Crop Sci 60:211–225

    CAS  Google Scholar 

  • Lorenzana RE, Bernardo R (2009) Accuracy of genotypic value predictions for marker-based selection in biparental plant populations. Theor Appl Genet 120:151–161

    PubMed  Google Scholar 

  • Lyra DH, de Freitas Mendonca L, Galli G, Alves FC, Granato ÍSC, Fritsche-Neto R (2017) Multi-trait genomic prediction for nitrogen response indices in tropical maize hybrids. Mol Breed. https://doi.org/10.1007/s11032-017-0681-1

    Article  Google Scholar 

  • Matias FI, Alves FC, Meireles KGX, Barrios SCL, do Valle CB, Endelman JB et al (2019) On the accuracy of genomic prediction models considering multi-trait and allele dosage in Urochloa spp. interspecific tetraploid hybrids. Mol Breed 39:1–16

    Google Scholar 

  • Melo ATO, Bartaula R, Hale I (2016) GBS-SNP-CROP: a reference-optional pipeline for SNP discovery and plant germplasm characterization using variable length, paired-end genotyping-by-sequencing data. BMC Bioinform 17:1–15

    Google Scholar 

  • Moragues M, Comadran J, Waugh R, Milne I, Flavell AJ, Russell JR (2010) Effects of ascertainment bias and marker number on estimations of barley diversity from high-throughput SNP genotype data. Theor Appl Genet 120:1525–1534

    CAS  PubMed  Google Scholar 

  • Morosini JS, de Freitas Mendonça L, Lyra DH, Galli G, Vidotti MS, Fritsche-Neto R (2017) Association mapping for traits related to nitrogen use efficiency in tropical maize lines under field conditions. Plant Soil 421:453–463

    CAS  Google Scholar 

  • Munjal G, Hao J, Teuber LR, Brummer EC (2018) Selection mapping identifies loci underpinning autumn dormancy in alfalfa (Medicago sativa). G3 Genes, Genomes, Genet 8:461–468

    Google Scholar 

  • Negro SS, Millet EJ, Madur D, Bauland C, Combes V, Welcker C et al (2019) Genotyping-by-sequencing and SNP-arrays are complementary for detecting quantitative trait loci by tagging different haplotypes in association studies. BMC Plant Biol 19:1–22

    CAS  Google Scholar 

  • Park JH, Gail MH, Weinberg CR, Carroll RJ, Chung CC, Wang Z et al (2011) Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proc Natl Acad Sci USA 108:18026–18031

    CAS  PubMed  PubMed Central  Google Scholar 

  • Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012) Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS ONE 7:e32253

    CAS  PubMed  PubMed Central  Google Scholar 

  • Rasheed A, Hao Y, Xia X, Khan A, Xu Y, Varshney RK et al (2017) Crop breeding chips and genotyping platforms: progress, challenges, and perspectives. Mol Plant 10:1047–1064

    CAS  PubMed  Google Scholar 

  • Ratan A, Zhang Y, Hayes VM, Schuster SC, Miller W (2010) Calling SNPs without a reference sequence. BMC Bioinform. https://doi.org/10.1186/1471-2105-11-130

    Article  Google Scholar 

  • Ribaut JM, Ragot M (2019) Modernising breeding for orphan crops: tools, methodologies, and beyond. Planta 250:971–977

    CAS  PubMed  Google Scholar 

  • Rognes T, Flouri T, Nichols B, Quince C, Mahé F (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 2016:1–22

    Google Scholar 

  • Rousselle Y, Jones E, Charcosset A, Moreau P, Robbins K, Stich B et al (2015) Study on essential derivation in maize: III. Selection and evaluation of a panel of single nucleotide polymorphism loci for use in European and North American germplasm. Crop Sci 55:1170–1180

    CAS  Google Scholar 

  • Simeone R, Misztal I, Aguilar I, Legarra A (2011) Evaluation of the utility of diagonal elements of the genomic relationship matrix as a diagnostic tool to detect mislabelled genotyped animals in a broiler chicken population. J Anim Breed Genet 128:386–393

    CAS  PubMed  Google Scholar 

  • Song B, Song Y, Fu Y, Kizito EB, Kamenya SN, Kabod PN et al (2019) Draft genome sequence of Solanum aethiopicum provides insights into disease resistance, drought tolerance, and the evolution of the genome. Gigascience 8:1–16

    CAS  Google Scholar 

  • Sousa MB, Galli G, Lyra DH, Granato ÍSC, Matias FI, Alves FC et al (2019) Increasing accuracy and reducing costs of genomic prediction by marker selection. Euphytica 215:18

    Google Scholar 

  • Technow F, Riedelsheimer C, Schrag TA, Melchinger AE (2012) Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects. Theor Appl Genet 125:1181–1194

    PubMed  Google Scholar 

  • Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melchinger AE (2014) Genome properties and prospects of genomic prediction of hybrid performance in a breeding program of maize. Genetics 197:1343–1355

    PubMed  PubMed Central  Google Scholar 

  • Thomson MJ (2014) High-throughput SNP genotyping to accelerate crop improvement. Plant Breed Biotechnol 2:195–212

    Google Scholar 

  • Unterseer S, Bauer E, Haberer G, Seidel M, Knaak C, Ouzunova M et al (2014) A powerful tool for genome analysis in maize: development and evaluation of the high density 600 k SNP genotyping array. BMC Genom 15:1–15

    Google Scholar 

  • VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423

    CAS  PubMed  Google Scholar 

  • Vidotti MS, Lyra DH, Morosini JS, Granato ÍSC, Quecine MC, de Azevedo JL et al (2019) Additive and heterozygous (dis)advantage GWAS models reveal candidate genes involved in the genotypic variation of maize hybrids to Azospirillum brasilense. PLoS ONE 14:1–21

    Google Scholar 

  • Wang J, Zhou Z, Zhang Z, Buckler ES, Zhang Z (2018) Expanding the BLUP alphabet for genomic prediction adaptable to the genetic architectures of complex traits. Heredity 121:648–662

    CAS  PubMed  PubMed Central  Google Scholar 

  • Xu C, Ren Y, Jian Y, Guo Z, Zhang Y, Xie C et al (2017) Development of a maize 55 K SNP array with improved genome coverage for molecular breeding. Mol Breed. https://doi.org/10.1007/s11032-017-0622-z

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhang X, Pérez-Rodríguez P, Semagn K, Beyene Y, Babu R, López-Cruz MA et al (2015) Genomic prediction in biparental tropical maize populations in water-stressed and well-watered environments using low-density and GBS SNPs. Heredity 114:291–299

    CAS  PubMed  Google Scholar 

  • Zhang X, Zhang H, Li L, Lan H, Ren Z, Liu D et al (2016) Characterizing the population structure and genetic diversity of maize breeding germplasm in Southwest China using genome-wide SNP markers. BMC Genom 17:1–16

    CAS  Google Scholar 

  • Zhao Y, Zeng J, Fernando R, Reif JC (2013) Genomic prediction of hybrid wheat performance. Crop Sci 53:802–810

    Google Scholar 

  • Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28:3326–3328

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)—Process 2017/24327-0.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Sabadin.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

This article do not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Bing Yang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2593 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sabadin, F., Carvalho, H.F., Galli, G. et al. Population-tailored mock genome enables genomic studies in species without a reference genome. Mol Genet Genomics 297, 33–46 (2022). https://doi.org/10.1007/s00438-021-01831-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-021-01831-9

Keywords

Navigation