Skip to main content
Log in

The effects of encoding data in diversity studies and the applicability of the weighting index approach for data analysis from different molecular markers

  • Original Article
  • Published:
Plant Systematics and Evolution Aims and scope Submit manuscript

Abstract

The use of molecular markers to study genetic diversity represents a breakthrough in this area, because of the increase in polymorphism levels and phenotypic neutrality. Codominant markers, such as microsatellites (SSR), are sensitive enough to distinguish the heterozygotes in genetic studies. Despite this advantage, there are some studies that ignore this feature and work with encoded data because of the simplicity of the evaluation, existence of polyploids and need for the combined analysis of different types of molecular markers. Thus, our study aims to investigate the consequences of these encodings on simulated and real data. In addition, we suggest an alternative analysis for genetic evaluations using different molecular markers. For the simulated data, we proposed the following two scenarios: the first uses SNP markers, and the second SSR markers. For real data, we used the SSR genotyping data from Coffea canephora accessions maintained in the Embrapa Germplasm Collection. The genetic diversity was studied using cluster analysis, the dissimilarity index, and the Bayesian approach implemented in the STRUCTURE software. For the simulated data, we observed a loss of genetic information to the encoded data in both scenarios. The same result was observed in the coffee studies. This loss of information was discussed in the context of a plant-breeding program, and the consequences were weighted to germplasm evaluations and the selection of parents for hybridization. In the studies that involved different types of markers, an alternative to the combined analysis is discussed, where the informativeness, coverage and quality of markers are weighted in the genetic diversity studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376. doi:10.1371/journal.pone.0003376

    Article  PubMed Central  PubMed  Google Scholar 

  • Baruah A, Naik V, Hendre PS, Rajkumar R, Rajendrakumar P, Aggarwal RK (2003) Isolation and characterization of nine microsatellite markers from Coffea Arabica L., showing wide cross-species amplifications. Mol Ecol Notes 3:647–650

    Google Scholar 

  • Belaj A, Satovic Z, Cipriani G, Baldoni L, Testolin R, Rallo L, Trujillo I (2003) Comparative study of the discriminating capacity of RAPD, AFLP and SSR markers and of their effectiveness in establishing genetic relationships in olive. Theor Appl Genetics 107:736–744

    Google Scholar 

  • Bhat PR, Krishnakumar V, Hendre PS, Rajendrakumar P, Varshney RK, Aggarwal RK (2005) Identification and characterization of expressed sequence tags-derived simple sequence repeats markers from robusta coffee variety ‘CxR’ (an interspecific hybrid of Coffea canephora x Coffea congensis). Mol Ecol Notes 5:80–83

    Google Scholar 

  • Bonin A, Ehrich D, Manel S (2007) Statistical analysis of amplified fragment length polymorphism data: a toolbox for molecular ecologists and evolutionists. Mol Ecol 16:3737–3758

    Google Scholar 

  • Bruvo R, Michiels NK, D’Souza TG, Schulenburg H (2004) A simple method for the calculation of microsatellite genotype distances irrespective of ploidy level. Mol Ecol 13:2101–2106

    Google Scholar 

  • Combes MC, Andrzejewski S, Anthony F, Bertrand B, Rovell P, Graziosi G, Sashermes P (2000) Characterization of microsatellites loci in Coffea arabica and related coffee species. Mol Ecol 9:1171–1193

    Google Scholar 

  • Coulibaly I, Revol B, Noirot M, Poncet V, Lorieux M, Carasco-Lacombe C, Minier J, Dufour M, Hamon P (2003) AFLP and SSR polymorphism in a Coffea interspecific backcross progeny [(C heterocalyx x C. canephora) x C. canephora]. Theor Appl Genet 107:1148–1155

    Google Scholar 

  • Cruz CD (2013) GENES—a software package for analysis in experimental statistics and quantitative genetics. Acta Sci 35:271–276

    Google Scholar 

  • Cruz CD, Medeiros FF, Pessoni LA (2011) Biometria aplicada ao estudo de diversidade genética. Viçosa, MG

  • De Silva HN, Hall AJ, Rikkerink E, McNeilage MA, Fraser LG (2005) Estimation of allele frequencies in polyploids under certain patterns of inheritance. Heredity 95:327–334

    Google Scholar 

  • Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26:297–302

    Google Scholar 

  • Diniz EC, Sakiyama NS, Lashermes P, Caixeta ET, Oliveira ACB, Zambolim EM, Loureiro ME, Pereira AA, Zambolim L (2005) Analysis of AFLP markers associated to the Mex-1 resistance locus in Icatu progenies. Crop Breed Appl Biotechnol 5:387–393

    Google Scholar 

  • Earl D, vonHoldt B (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genetics Resour 4:359–361

    Google Scholar 

  • Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE (2011) A robust, simple genotyping-bysequencing (GBS) approach for high diversity species. PLoS One 6:e19379. doi:10.1371/journal.pone.0019379

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620

    Google Scholar 

  • Falush D, Stephens M, Pritchard JK (2007) Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes 7:574–578

    Google Scholar 

  • Ferrão LFV, Caixeta ET, Souza FD, Zambolim EM, Cruz CD, Zambolim L, Sakiyama NS (2013) Comparative study of different molecular markers for classifying and establishing genetic relationships in Coffea canephora. Plant Syst Evol 299:225–238

    Google Scholar 

  • Gallego FJ, Perez MA, Nunez Y, Hidalgo P (2005) Comparison of RAPDs, AFLPs and SSR markers for the genetic analysis of yeast strains of Saccharomyces cerevisiae. Food Microbiol 22:561–568

    Google Scholar 

  • Ramos HCC, Pereira MG, Goncalves LSA, do Amaral AT, Scapim CA (2011) Comparison of multiallelic distances for the quantification of genetic diversity in the papaya. Acta Sci Agron 33:59–66

    Google Scholar 

  • Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaudoise Des Sci Nat, 223–270

  • Jaccoud D, Peng K, Feinstein D, Kilian A (2001) Diversity arrays: a solid state technology for sequence information independent genotyping. Nucl Acids Res 29:E25

    Google Scholar 

  • Karp A, Kresovich S, Bhat K, Ayad W, Hodgkin T (1997) Molecular tools in plant genetic resources conservation: a guide to the technologie. International Plant Genetic Resources Institute, Rome

    Google Scholar 

  • Kosman E, Leonard KJ (2005) Similarity coefficients for molecular markers in studies of genetic relationships between individuals for haploid, diploid, and polyploid species. Mol Ecol 14:415–424

    Article  CAS  PubMed  Google Scholar 

  • Lamia K, Hedia B, Jean-Marc A, Neila TF (2010) Comparative analysis of genetic diversity in Tunisian apricot germplasm using AFLP and SSR markers. Sci Horticult 127:54–63

    Google Scholar 

  • Laurentin H (2009) Data analysis for molecular characterization of plant genetic resources. Genetic Resour Crop Evol 156:277–292

    Google Scholar 

  • Leroy T, Marraccini P, Dufour M, Montagnon C, Lashermes P, Sabau X, Ferreira LP, Jourdan I, Pot D, Andrade AC, Glaszmann JC, Vieira LG, Piffanelli P (2005) Construction and characterization of a Coffea canephora BAC library to study the organization of sucrose biosynthesis genes. Theor Appl Genet 111:1032–1041

    Google Scholar 

  • Markwith SH, Stewart DJ, Dyer JL (2006) TETRASAT: a program for the population analysis of allotetraploid microsatellite data. Mol Ecol Notes 6:586–589

    Google Scholar 

  • Missio RF, Caixeta ET, Zambolim EM, Zambolim L, Cruz CD, Sakiyama NS (2010) Polymorphic information content of SSR markers for Coffea spp. Crop Breed Appl Biotechnol 10:89–94

    Google Scholar 

  • Mohammadi SA, Prasanna BM (2003) Analysis of genetic diversity in crop plants—salient statistical tools and considerations. Crop Sci, 1235–1248

  • Moncada P, McCouch S (2004) Simple sequence repeat diversity in diploid and tetraploid Coffea species. Genome 47:501–509

    Google Scholar 

  • Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci USA 70:3321–3323

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Obbard DJ, Harris SA, Pannell JR (2006) Simple allelic-phenotype diversity and differentiation statistics for allopolyploids. Heredity 97:296–303

    Google Scholar 

  • Polland JA, Rife T (2012) Genotyping-by-sequencing for plant breeding and genetics. The Plant Genome 5:3

    Google Scholar 

  • Poncet V, Hamon P, Minier J, Carasco C, Hamon S, Noirot M (2004) SSR cross-amplification and variation within coffee trees (Coffea spp.). Genome 47:1071–1081

    Google Scholar 

  • Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959

    Google Scholar 

  • Rambaut A (2006) Tree figure drawing tool version 1.3.1. Institute of Evolutionary Biology, University of Edinburgh, UK

  • Guichoux E, Lagache L, Wagner S, Chaumeil P, Leger P, Lepais O, Lepoittevin C, Malausa T, Revardel E, Salin F, Petit RJ (2011) Current trends in microsatellite genotyping. Mol Ecol Resour 11:591–611

    Google Scholar 

  • Rovelli P, Mettulio R, Anthony F, Anzuetto F, Lashermes P, Graziosi G (2000) Microsatellites in Coffea arabica L. In: Sera T, Soccol CR, Pandey A and Roussos S (eds) Coffee biotechnology and quality. Kluwer Academic Publishers, Dordrecht, pp 123–133

  • Russell JR, Fuller JD, Macaulay M, Hatz BG, Jahoor A, Powell W, Waugh R (1997) Direct comparison of levels of genetic variation among barley accessions detected by RFLPs, AFLPs, SSRs and RAPDs. Theor Appl Genet 95:714–722

    Google Scholar 

  • Serang O, Mollinari M, Garcia AA (2012) Efficient exact maximum a posteriori computation for bayesian SNP genotyping in polyploids. PLoS One 7:e30906

    Google Scholar 

  • Smouse PE, Peakall R (1999) Spatial autocorrelation analysis of individual multiallele and multilocus genetic structure. Heredity 82:561–573

    Google Scholar 

  • Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull, 1409–1438

  • Souframanien J, Gopalakrishna T (2004) A comparative analysis of genetic diversity in blackgram genotypes using RAPD and ISSR markers. Theor Appl Genet 109:1687–1693

    Article  CAS  PubMed  Google Scholar 

  • Souza FF (2011) Estudos sobre a diversidade, estrutura populacional, desequilíbrio de ligação e mapeamento associativo em Coffea canephora Pierre ex Froehner, Dissertation of Universidade Federal de Viçosa, Viçosa, Brazil

  • Ewens WJ, Spielman, RS (1995) The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 57:455–64

    Google Scholar 

  • VAN Puvvelde K, VAN Geert A, Triest L (2010) atetra, a new software program to analyse tetraploid microsatellite data: comparison with tetra and tetrasat. Mol Ecol Resour 10:331–334

  • Varshney RK, Chabane K, Hendre PS, Aggarwal RK, Graner A (2007) Comparative assessment of EST-SSR, EST-SNP and AFLP markers for evaluation of genetic diversity and conservation of genetic resources using wild, cultivated and elite barleys. Plant Sci 173:638–649

    Google Scholar 

  • Wright S (1965) The interpretation of population-structure by F-statistics with special regard to systems of mating. Evolution, 395–420

  • Wright S (1978) Evolution and the genetics of populations. Univ. Chicago Press, Chicago

    Google Scholar 

Download references

Acknowledgments

The authors thank Dr. Romário G. Ferrão, Dr. Abrãao C. Verdin-Filho and Paulo Volpi for giving us additional coffee samples from the Capixaba Research Institute—Technical Assistance and Rural Extension (Instituto Capixaba de Pesquisa, Assistência Técnica e Extensão Rural -Incaper). We also thank Milton M. Santos, João Maria Diocleciano and Gilvan O. Ferro for the technical support at Embrapa Experimental Station, in Rondônia, and, Rejane L. Freitas, Telma Fallieri and Tesfahun A. Sotetaw for the technical support at UFV laboratory, in Viçosa. This work was financially supported by Consórcio Brasileiro de Pesquisa e Desenvolvimento do Café, Agrofuturo—Embrapa and National Council of Scientific and Technological Development (CNPq).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eveline T. Caixeta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrão, L.F.V., Caixeta, E.T., Cruz, C.D. et al. The effects of encoding data in diversity studies and the applicability of the weighting index approach for data analysis from different molecular markers. Plant Syst Evol 300, 1649–1661 (2014). https://doi.org/10.1007/s00606-014-0990-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00606-014-0990-3

Keywords

Navigation