Skip to main content

Advertisement

Log in

Comparing assignment-based approaches to breed identification within a large set of horses

  • Animal Genetics • Original Paper
  • Published:
Journal of Applied Genetics Aims and scope Submit manuscript

Abstract

Considering the extensive data sets and statistical techniques, animal breeding embodies a branch of machine learning that has a constantly increasing impact on breeding. In our study, information regarding the potential of machine learning and data mining within a large set of horses and breeds is presented. The individual assignment methods and factors influencing the success rate of the procedure are compared at the Czech population scale. The fixation index values ranged from 0.057 (HMS1) to 0.144 (HTG6), and the overall genetic differentiation amounted to 8.9% among the breeds. The highest genetic divergence (FST = 0.378) was established between the Friesian and Equus przewalskii; the highest degree of gene migration was obtained between the Czech and Bavarian Warmblood (Nm = 14,302); and the overall global heterozygote deficit across the populations was 10.4%. The eight standard methods (Bayesian, frequency, and distance) using GeneClass software and almost all mainstream classification algorithms (Bayes Net, Naive Bayes, IB1, IB5, KStar, JRip, J48, Random Forest, Random Tree, PART, MLP, and SVM) from the WEKA machine learning workbench were compared by utilizing 314,874 real allelic data sets. The Bayesian method (GeneClass, 89.9%) and Bayesian network algorithm (WEKA, 84.8%) outperformed the other techniques. The breed genomic prediction accuracy reached the highest value in the cold-blooded horses. The overall proportion of individuals correctly assigned to a population depended mainly on the breed number and genetic divergence. These statistical tools could be used to assess breed traceability systems, and they exhibit the potential to assist managers in decision-making as regards breeding and registration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Baudouin L, Lebrun P (2000) An operational bayesian approachfor the identification of sexually reproduced cross-fertilized populations using molecular markers. Acta Hortic 546:81–93. https://doi.org/10.17660/ActaHortic.2001.546.5

    Article  Google Scholar 

  • Bjørnstad G, Røed KH (2002) Evaluation of factors affecting individual assignment precision using microsatellite data from horse breeds and simulated breed crosses. Anim Genet 33:264–270

    Article  PubMed  Google Scholar 

  • Cavalli-Sforza LL, Edwards AWF (1967) Phylogenetic analysis: models and estimation procedures. Am J Hum Genet 19:233–257

    CAS  PubMed  PubMed Central  Google Scholar 

  • Cornuet JM, Piry S, Luikart G, Estoup A, Solignac M (1999) New methods employing multilocus genotypes to select or exclude populations as origins of individuals. Genetics 153:1989–2000

    CAS  PubMed  PubMed Central  Google Scholar 

  • Dalvit C, De Marchi M, Dal Zotto R, Gervaso M, Meuwissen T, Cassandro M (2008) Breed assignment test in four Italian beef cattle breeds. Meat Sci 80:389–395

    Article  CAS  PubMed  Google Scholar 

  • Fan B, Chen YZ, Moran C, Zhao SH, Liu B, Zhu MJ, Xiong TA, Li K (2005) Individual-breed assignment analysis in swine populations by using microsatellite markers. Asian Australas J Anim Sci 18:1529–1534

    Article  CAS  Google Scholar 

  • Goldstein DB, Ruiz Linares A, Cavalli-Sforza LL, Feldman MW (1995) Genetic absolute dating based on microsatellites and the origin of modern humans. Proc Natl Acad Sci U S A 92:6723–6727

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Goodman SJ (1997) Rst Calc: a collection of computer programs for calculating estimates of genetic differentiation from microsatellite data and determining their significance. Mol Ecol 6:881–885

    Article  CAS  Google Scholar 

  • Goudet J (2001) FSTAT, a program to estimate and test gene diversities and fixation indices (version 2.9.3). Available from http://www.unil.ch/izea/softwares/fstat.html. Accessed 24 December 2017

  • Hauser L, Seamons TR, Dauer M, Naish KA, Quinn TP (2006) An empirical verification of population assignment methods by marking and parentage data: hatchery and wild steelhead (Oncorhynchus mykiss) in Forks Creek, Washington, USA. Mol Ecol 15:3157–3173

    Article  CAS  PubMed  Google Scholar 

  • Iquebal MA, Sarika, Dhanda SK et al (2013) Development of a model webserver for breed identification using microsatellite DNA marker. BMC Genet 14:118

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Iquebal MA, Ansari MS, Sarika DSP, Verma NK, Aggarwal RA, Jayakumar S, Rai A, Kumar D (2014) Locus minimization in breed prediction using artificial neural network approach. Anim Genet 45:898–902

    Article  CAS  PubMed  Google Scholar 

  • Jaiswal S, Dhanda SK, Iquebal MA, Arora V, Shah TM, Angadi UB, Joshi CG, Raghava GPS, Rai A, Kumar D (2016) BIS-CATTLE: a web server for breed identification using microsatellite DNA markers. Curr Res Bioinforma 5:10–17

    Article  Google Scholar 

  • Jamieson A, Taylor SCS (1997) Comparisons of three probability formulae for parentage exclusion. Anim Genet 28:397–400

    Article  CAS  PubMed  Google Scholar 

  • Kalinowski ST, Taper ML, Marshall TC (2007) Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol Ecol 16:1099–1106

    Article  PubMed  Google Scholar 

  • Koskinen M (2003) Individual assignment using microsatellite DNA reveals unambiguous breed identification in the domestic dog. Anim Genet 34:297–301

    Article  CAS  PubMed  Google Scholar 

  • Liu K, Muse SV (2005) PowerMarker: integrated analysis environment for genetic marker data. Bioinformatics 21:2128–2129

    Article  CAS  PubMed  Google Scholar 

  • Nei M (1972) Genetic distance between populations. Am Nat 106:283–291

    Article  Google Scholar 

  • Nei M (1973a) The theory and estimation of genetic distances. In: Morton NE (ed) Genetic Structure of Populations. University Press of Hawaii, Honolulu

    Google Scholar 

  • Nei M (1973b) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci U S A 70:3321–3323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nei M, Tajima F, Tateno Y (1983) Accuracy of estimated phylogenetic trees from molecular data. J Mol Evol 19:153–170

    Article  CAS  PubMed  Google Scholar 

  • Paetkau D, Calvert W, Stirling I, Strobeck C (1995) Microsatellite analysis of population structure in Canadian polar bears. Mol Ecol 4:347–354

    Article  CAS  PubMed  Google Scholar 

  • Pérez-Enciso M (2017) Animal breeding learning from machine learning. J Anim Breed Genet 134:85–86

    Article  PubMed  Google Scholar 

  • Piry S, Alapetite A, Cornuet JM, Paetkau D, Baudouin L, Estoup A (2004) GeneClass2: a software for genetic assignment and first-generation migrant detection. J Hered 95:536–539

    Article  CAS  PubMed  Google Scholar 

  • Putnová L, Štohl R, Vrtková I (2018) Genetic monitoring of horses in the Czech Republic: a large-scale study with a focus on the Czech autochthonous breeds. J Anim Breed Genet 135:73–83

    Article  CAS  PubMed  Google Scholar 

  • Rannala B, Mountain JL (1997) Detecting immigration by using multilocus genotypes. Proc Natl Acad Sci U S A 94:9197–9201

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rousset F (2008) Genepop'007: a complete reimplementation of the Genepop software for windows and Linux. Mol Ecol Resour 8:103–106

    Article  PubMed  Google Scholar 

  • Talle SB, Fimland E, Syrstad O, Meuwissen T, Klungland H (2005) Comparison of individual assignment methods and factors affecting assignment success in cattle breeds using microsatellites. Acta Agric Scand Sect A-Anim Sci 55:74–79

    CAS  Google Scholar 

  • Van de Goor LH, van Haeringen WA, Lenstra JA (2011) Population studies of 17 equine STR for forensic and phylogenetic analysis. Anim Genet 42:627–633

    Article  PubMed  Google Scholar 

  • Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004) MICRO-CHECKER: software for identifying and correcting genotyping errors in microsatellite data. Mol Ecol Notes 4:535–538

    Article  CAS  Google Scholar 

  • Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370

    CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Professor Petr Hořín (Department of Animal Genetics, VFU Brno) for providing samples of the Camargue, Murgese, and Icelandic horses. This section would be incomplete without quoting Irena Vrtková, PhD (Laboratory of Agrogenomics) and her unwavering support over the years.

Funding

The research was funded by a project (NAZV QH92277) of the National Agency for Agricultural Research of the Ministry of Agriculture of the Czech Republic, utilizing the institutional support for the development of Mendel University in Brno. Furthermore, the research was supported by the Ministry of Education, Youth and Sports under project No. LO1210 solved at the Centre for Research and Utilization of Renewable Energy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lenka Putnová.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical statement

All procedures performed in studies involving animals were in accordance with the ethical standards of the institution or practice at which the studies were conducted.

Additional information

Communicated by: Maciej Szydlowski

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Table S1

The multilocus Nm (below the diagonal) and FST values (above the diagonal) between pairs of 43 populations studied across all loci (n = 9261). (DOCX 48 kb)

Table S2

The numbers of animals sampled per population and correctly assigned, and the individual assignment success rates for each population achieved using different assignment methods and numbers of microsatellite markers (GeneClass). (DOCX 35 kb)

Table S3

The individual assignment success as calculated by GeneClass using the Bayesian method (Rannala & Mountain) for each horse breed (n = 2879). (DOCX 22 kb)

Table S4

The numbers of animals sampled per population and correctly assigned, and the individual assignment success rates for each population achieved using different assignment methods and numbers of microsatellite markers (the WEKA software). (DOCX 36 kb)

Table S5

The performance of the Bayes Net classification model tested for breed identification as the confusion matrix (the average accuracy of 84.8%). (DOCX 21 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Putnová, L., Štohl, R. Comparing assignment-based approaches to breed identification within a large set of horses. J Appl Genetics 60, 187–198 (2019). https://doi.org/10.1007/s13353-019-00495-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13353-019-00495-x

Keywords

Navigation