Exploring the genetics of trotting racing ability in horses using a unique Nordic horse model
Horses have been strongly selected for speed, strength, and endurance-exercise traits since the onset of domestication. As a result, highly specialized horse breeds have developed with many modern horse breeds often representing closed populations with high phenotypic and genetic uniformity. However, a great deal of variation still exists between breeds, making the horse particularly well suited for genetic studies of athleticism. To identify genomic regions associated with athleticism as it pertains to trotting racing ability in the horse, the current study applies a pooled sequence analysis approach using a unique Nordic horse model.
Pooled sequence data from three Nordic horse populations were used for FST analysis. After strict filtering, FST analysis yielded 580 differentiated regions for trotting racing ability. Candidate regions on equine chromosomes 7 and 11 contained the largest number of SNPs (n = 214 and 147, respectively). GO analyses identified multiple genes related to intelligence, energy metabolism, and skeletal development as potential candidate genes. However, only one candidate region for trotting racing ability overlapped a known racing ability QTL.
Not unexpected for genomic investigations of complex traits, the current study identified hundreds of candidate regions contributing to trotting racing ability in the horse. Likely resulting from the cumulative effects of many variants across the genome, racing ability continues to demonstrate its polygenic nature with candidate regions implicating genes influencing both musculature and neurological development.
KeywordsAthleticism Conformation Genomic Performance Racehorse
Estimated breeding values
Equus caballus chromosome
Glycerophospodiester phosphodiesterase domain containing 1
Norwegian-Swedish Coldblooded trotter
North Swedish Draught horse
Protein phosphatase Mg2+/Mn2+ dependent 1E
Proline rich 11
RAD51 paralog C
Spindle and kinetochore associated complex subunit 2
Sortilin related VPS10 domain containing receptor 3
Trotting racing ability
Tripartite motif containing 13
Tripartite motif containing 37
Yippee like 2
As genomics improves and enables the design of more targeted studies relating genotypes to phenotypes, the opportunity for non-model organisms continues to expand - facilitating greater opportunities to gain novel insight into the mechanisms regulating biological homeostasis and health [1, 2]. Genomic studies of natural model species, domestic species in particular, give a complimentary view of genotype-phenotype relationships compared with the knowledge gained from the study of humans and experimental organisms . Since the onset of domestication, horses have been strongly selected for, among other things, speed, strength, and endurance-exercise traits . This diverse and, at times, divergent selection has ultimately led to the development of highly specialized horse breeds. Within the last 400 years, breed specialization has focused primarily on preserving and improving traits related to aesthetics and performance . As a result, most horse breeds today are closed populations with high phenotypic and genetic uniformity within breed. However, a great deal of variation continues to exist among breeds . This variation, combined with breed specialization, has made the horse particularly well suited for genetic studies of locomotion patterns and provides a unique opportunity for genetic studies of athleticism [1, 2, 3, 4, 5]. Generally speaking, athleticism describes the physical qualities that are characteristic of athletes and typically refers to traits such as strength, fitness, and agility. Many modern day horse breeds exemplify some if not all of these traits, with shared selective pressures within breeds (e.g. health, fertility traits, conformation) and divergent selection between breeds (e.g. speed vs strength) yielding a wide range of athletic phenotypes [3, 6].
Despite a dispersed history of crossbreeding with SBs, the relationship between the NSCT and NSD remains closer than either of the breeds with the SB . While both the NSCT and the SB are selected for racing performance, the Norwegian and Swedish breed organizations have remained highly committed to preserving the historical work-horse appearance of the NSCT breed [7, 8]. As a result, both NSCTs and NSDs can be classified as heavy horse breeds, with NSCTs sometimes referred to as “draft trotters” (Fig. 1) . Any lingering genetic similarities between the NSCT and the SB are therefore highly likely to be associated with favorable traits for TRA. Our aim was to identify these similarities using pooled whole genome sequence data from a carefully selected sample of NSCTs, NSDs, and SBs.
Pooled population samples and genome information
Number of individualsa
Nucleotide diversity (π) (%)b
Norwegian-Swedish Coldblooded trotter
North Swedish Draught
Differentiation between breeds
QTLs overlapped by trotting racing ability candidate regions
Ricard et al. 2017
For all intents and purposes, racehorses are professional athletes. Like professional human athletes, racehorses must not only endure the day to day physical demands of their sport, but they must also have a genetic capacity for athleticism relative to their sport in order to ultimately achieve success (i.e. win). However, unlike in humans, racehorses have been carefully selected and bred for centuries, resulting in alleles with subtle effects on athleticism being enriched over time. As a result, racehorses in particular provide a unique opportunity to identify genes and subsequently the molecular mechanisms underpinning athletic ability. Rarely found outside of the Nordic countries, the NSCT is perhaps one of the most unique types of racehorse in the modern era – originating not from historic racing breeds such as the Thoroughbred or Standardbred, but instead tracing its lineage back to the original North Swedish horse [10, 11]. Using whole-genome re-sequencing of pooled DNA from a carefully selected group of NSCTs, NSDs, and SBs, we capitalized on this unusual ancestry of the NSCT and identified 580 candidate regions for TRA in the horse.
Only one previously characterized QTL for racing ability was overlapped by the TRA candidate regions in the current study . A SNP in the first intron of the sortilin related VPS10 domain containing receptor 3 (SORCS3) gene, previously associated with racing speed in endurance horses, was overlapped by a 107.5 kb candidate region on ECA1 . The gene encodes for a type-I transmembrane receptor protein that is a member of family of receptors with known pleiotropic functions . Genetic variation in SORCS3 has been associated with Alzheimer’s disease in humans with more recent studies suggesting that additive epistatic effects of genetic variants within the gene may be important [17, 18, 19]. Furthermore, variation in SORCS3 has been associated with attention deficit hyperactivity disorder and SORCS3 knockout mice display defects in spatial learning and memory, as well as increased fear extinction [20, 21]. Although the function of this gene in horses remains unknown, as the transcript is generally expressed at high levels in the brain, it could perhaps alter an individual’s perception of athletic competition (e.g. altered feedback loop in response to exercise, reduced fear extinction) .
Interestingly, while a mere 1.9% (46.39 Mb) of the genome was covered by TRA candidate regions, the regions on ECA11 not only accounted for 11.2% of all TRA candidate regions identified, but ECA11 candidate regions also contained some of the highest number of SNPs (n = 100+) and the largest sweeps (> 50 kb in length). Furthermore, ECA11 had the highest concentration of candidate TRA regions when compared to the other equine chromosomes. However, despite the density of this TRA signal, no QTLs for racing ability have been mapped to ECA11 .
Consequently, regions on ECA11 have previously been associated with size (i.e. height and mass), which, given the prudent design of the current study, is particularly interesting [3, 23, 24]. Although similar in height, NSCTs, NSDs, and SBs differ in their physique. SBs tend to be leaner and more refined in their appearance compared with NSCTs, while NSCTs tend to be leaner and more refined than NSDs (Fig. 1). In order for a region to have been considered as a candidate region for TRA in the current study, the region had to be highly similar between NSCTs and SBs yet decidedly different from NSDs. It is possible that strict adherence to include only top performing NSCTs in the sequenced pool may have skewed the NSCT pool towards lighter framed horses; thereby demonstrating what conceivably is a competitive racing advantage for lighter horses. Perhaps even more interesting to note is that no previously reported QTLs for growth or conformation traits overlapped any of the candidate regions for TRA on ECA11; however, a recent study in American Quarter horses also suggested ECA11 as potentially important for racing ability .
ECA11 also contained the candidate region with the second largest number of SNPs (n = 147) in the study. The region, located ECA11:32,874,000-33,557,500, encompasses 9 genes, (RAD51C, PPM1E, ENSECAG00000003590, ENSECAG00000015244, GDPD1,YPEL2, SKA2, PRR11) one of which is tripartite motif containing 37 (TRIM37). Mutations in TRIM37 are associated with Mulibrey Nanism in humans, an extremely rare autosomal recessive disorder characterized by profound growth delays and abnormalities of the muscles, liver, brain, and eyes [26, 27]. Instinctively this would further support the candidate region being associated with body size; however, even if this is the case, it does not necessarily mean the region is solely associated with body size and shape. It is highly plausible that haplotypes associated with body size differ by multiple substitutions with pleiotropic functional effects. Mutations that impact underlying mechanisms for muscle, ligament, and tendon development would certainly influence TRA – limiting racing ability in some instances, while enhancing racing ability in others [28, 29, 30]. Moreover, a large conserved haplotype containing tripartite motif containing 13 (TRIM13), a gene located on ECA17, has previously been suggested as having selective importance in the Thoroughbred . TRIM37, while located on a different chromosome, is part of the same gene family as TRIM13.
This study identified hundreds of candidate genomic regions contributing to TRA in the horse, a result not unexpected for investigations into the genomics of such a complex trait. The trait is undoubtedly polygenic, resulting from the cumulative effects of many variants across the genome. Candidates for TRA implicated both genes influencing musculature and conformation, as well as genes involved in neurological development, further suggesting that racing ability may not solely be a product of physical characteristics, but also mental characteristics. This study identified a strong racing ability signal on ECA11 that will be particularly interesting for follow-up.
Genomic DNA samples from 18 NSCTs, 25 NSDs, and 22 SBs were prepared from blood samples and pooled in equimolar ratios prior to library construction (Table 1). Each horse was selected based on strict breed specific criteria. All trotting horses racing in Norway and Sweden have breeding values estimated annually. For NSCTs, estimated breeding values (EBVs) are estimated using an animal model that includes the combined effect of country, sex, and birth year. EBVs are subsequently based on racing performance results (i.e. racing status and earnings) occurring between 3 and 6 years of age . For inclusion in the current study, NSCT horses were required to have an estimated breeding value of at least 115 and sire/progeny ratios were restricted to reflect the larger population as accurately as possible (Additional file 4) [7, 8].
For EBVs in SBs, the animal model includes genetic base group and a combination of sex and birth year with the evaluation based on racing performance results occurring between 2 and 5 years of age . For inclusion in the current study, Standardbreds were also required to have estimated breeding values of at least 115 and both SBs and NSDs were not allowed to have a common ancestor within three generations (i.e. no shared sires, dams, grandsires, granddams) [7, 9]. An EBV requirement for NSDs was not possible as EBVs are not calculated for this breed.
Pool sequencing, genome alignments, variant calling, and population analyses
Genome sequencing library construction and sequencing was carried out by SciLifeLab (Uppsala, Sweden) using two lanes on the Illumina HiSeq2500 (150 bp paired-end). Sequencing libraries were prepared from 100 ng DNA using the TruSeq Nano DNA sample preparation kit targeting an insert size of 350 bp. Reads were aligned to the Equus caballus genome (EquCab2.70) using BWA (v0.7.15) . Duplicates were marked with Picard (v1.118; http://broadinstitute.github.io/picard/) and GATK was used for realignment around indels . Samtools (v1.8 [34, 35]) was used to generate the mpileup files needed for Popoolation (v1.2.2) and PoPoolation2 (v1.201) . Nucleotide diversity (π) was calculated across 5000 bp windows for each population pool using Popoolation . PoPoolation2 was used to calculate FST over 1000 bp sliding windows with 50% overlap between the selected population samples using the Karlsson et al. method [36, 37]. Minimum count was set at 3, minimum coverage at 10, maximum coverage at 100, and minimum coverage fraction at 1.
Given the close relationship between NSCTs and NSDs, candidate regions for athletic traits were defined as genomic regions where FST values were relatively high between NSCTs and NSDs, but low between NSCTs and SBs. As such, stringent FST cutoffs (> 95% percentile, FST = 0.179 NSCT vs. NSD; < 5% percentile, FST = 0.013 NSCT vs. SB) were used when defining candidate regions. Windows with FST values that met these criteria were clustered into candidate sweep regions when they were less than 0.1 Mb from one another (custom R scripts) . Clusters containing only a single 1000 bp window or less than 2 SNPs were excluded. Candidate gene screening was subsequently carried out using the bioinformatics database Ensembl (http://www.ensembl.org/). Candidate regions from the FST analyses were used to generate a list of annotated genes using the Ensembl Biomart function. The resulting list of candidate genes was then piped into the PANTHER Classification system in order to obtain an overview of the molecular functions and biological processes affected by the candidate genes [39, 40]. Previously reported racing ability QTLs in the horse (downloaded from the horse QTL database; ) were also compared to differentiated regions to determine overlaps using bed file comparisons in BEDOPS .
We want to thank Christina Olsson, Annica Edberg, Knut Larsen, Thorvaldur Árnason, the Swedish Trotting Association, and the Norwegian Trotting Association for providing their support and the data/samples for the study. Sequencing was performed by the SNP&SEQ Technology Platform in Uppsala. The facility is part of the National Genomics Infrastructure (NGI) Sweden and Science for Life Laboratory. The SNP&SEQ Platform is also supported by the Swedish Research Council and the Knut and Alice Wallenberg Foundation.
This work was supported by the Swedish Research Council for Environment, Agricultural Science and Spatial Planning (Formas), 2016–00947, (GL). www.formas.se. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The data that support the findings of this study are available from the Swedish Trotting Association (Stockholm, Sweden) and the Norwegian Trotting Association (Oslo, Norway), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. However, data are available from the authors upon reasonable request and with permission of the Swedish Trotting Association (Stockholm, Sweden) and the Norwegian Trotting Association (Oslo, Norway).
BDV, KJF, and GL conceived and designed the experiments; BDV, KJF, MKR, MW, and ES contributed to sampling. GL and ES contributed the reagents and KJF and MKR extracted the DNA; BDV and ML analyzed the data and drafted the manuscript; KJF, MKR, MW, ES, MS, CFI, ML, and GL discussed and contributed to data analysis; All authors read and approved the final manuscript.
Ethics approval and consent to participate
All experimental procedures and sample collection methods were approved by the Ethics Committee for Animal Experiments in Uppsala, Sweden [Number: C 121/14]. Samples used in the study were already available at either the Animal Genetics Laboratory at SLU in Uppsala, Sweden or the Department of Basic Sciences and Aquatic Medicine at the Norwegian University of Life Sciences in Oslo, Norway as they previously had been used for parentage testing. Permission to use the samples was granted from the Swedish Trotting Association and the Norwegian Trotting Association (the owners of the samples per the rules/guidelines of the industry).
Consent for publication
The authors have the following interest: GL is a co-inventor on a granted patent concerning commercial testing of the DMRT3 mutation: A method to predict the pattern of locomotion in horses. PCT EP 12747875.8. European patent registration date: 2011-05-05, US patent registration date: 2011-08-03. There are no further patents, products in development, or marketed products to declare.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 7.Svensk Travsport:Uppfödning. https://www.travsport.se/artikel/uppfodning (2018). Accessed 31 Aug 2018.
- 8.Det Norske Travselskap: Næring og Avl. https://www.travsport.no (2018). Accessed 31 Aug 2018.
- 9.Föreningen Nordsvenska Hästen. http://www.nordsvensken.org/ (2018). Accessed 31 Aug 2018.
- 13.Árnason T, Bendroth M, Philipsson J, Henriksson K, Darenius A. Genetic evaluations of Swedish trotters-state of breeding evaluation in trotters. Proceedings of the European Federation of Animal Science symposium of the commission on horse production. Wageningen, the Netherlands: Pudoc; 1989. p. 106–29.Google Scholar
- 14.Pedigree Online All Breed Database. https://www.allbreedpedigree.com/ (2018). Accessed 31 Aug 2018.
- 29.Hill EW, McGivney BA, Gu J, Whiston R, MacHugh DE. A genome-wide SNP-association study confirms a sequence variant (g.66493737C>T) in the equine myostatin (MSTN) gene as the most powerful predictor of optimum racing distance for thoroughbred racehorses. BMC Genomics. 2010;11:552.CrossRefGoogle Scholar
- 38.R Development Core Team: R-A Language and Environment for Statistical Computing. https://www.r-project.org/. (2018). Accessed 13 Jan 2018.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.