Background

The determination of copy number variants (CNVs) has become increasingly important for the evaluation of genomic traits in domestic animals [1]. CNVs have been shown to be a major source for genetic variation especially in complex traits influencing gene expression, phenotypic variation, adaption and the development of diseases [2, 3]. Analyses using different detection methods have been performed in diverse species like cows, pigs and horses [47]. Recent whole-genome sequencing revealed 282 CNVs in a quarter horse mare and suggested CNVs to be an important resource for future studies of complex diseases and traits in horses [8]. CNV studies for horses were carried out by the array comparative genomic hybridization (CGH) methodology and the CNV detection algorithm PennCNV using whole genome SNP genotyping chips [4, 5]. Both techniques have been shown to be a valid method to detect CNVs [3, 6, 9]. It was proposed that a comparison between these methods might be an important step for understanding the advantages and disadvantages of these platforms [9]. SNP arrays were proposed to be advantageous due to lower prices, lower signal-to-noise ratios and the use of the parameter B-allele frequency which facilitates the interpretation of results [1, 6, 10]. Furthermore, less samples per experiment were assumed to be required for SNP genotyping compared to CGH analyses [11]. However, the main bias for SNP arrays was considered to be the low SNP coverage of the genome in regions of CNVs due to difficulties of assay development and implementation [11]. In CGH arrays the genome coverage was shown to be highly dependent of the study design. In pig breeds CGH analyses have been proposed to be advantageous due to an enhanced marker density and a uniform distribution of probes [12]. CNV analyses in horses were performed on a CGH array which only targeted exons and therefore did not allow an identification of CNVs in intergenic regions [5]. This was taken into account for the benefit of detecting smaller CNVs in coding exons of annotated genes [5].

CNV calls in CGH as well as SNP arrays have been considered to be highly dependent on the algorithms used for the identification of CNVs [11, 13, 14]. For SNP arrays, various CNV detection programs have been available. Comparative analyses of CNV detection algorithms suggested that multiple predictions from different detection programs increased the confidence in the data and helped to eliminate false positive results [11]. Nevertheless the accuracy of CNV detection by different algorithms has also been shown to be limited due to false negative results [15, 16]. SNP array analyses in horses have not been performed in more than one CNV detection algorithm yet [4]. On the other hand, the analysed horse populations varied from a low number of horses (16) of diverse breeds used in the CGH study, to a higher number of horses (520) of a few breeds (4) in SNP array based PennCNV detections [4, 5, 17]. Due to the development of the horse population into a highly variable group, evaluations of different breeds have been shown to be important, especially for complex variations which are supposed to be challenging due to genetic heterogeneity and variations in the phenotypic expression [18, 19].

Genome-wide association studies (GWAS) for height have been performed for CNVs in human and revealed copy-number variants that were proposed to play a role in the development of short stature [20]. Recent studies in horses identified CNVs in the region of genes mainly involved in sensory perception, signal transduction and metabolism but also in candidates for neuronal homeostasis, coat colour, blood group antigens, keratin formation and height [5]. A quantitative trait locus (QTL) for stature was found on equine chromosome (ECA) 16 at 75 Mb. Size variation in horses has been investigated in various GWAS for single nucleotide polymorphisms (SNPs) using BeadChip data [19, 2123]. On the whole, five loci on ECA3, ECA6, ECA9, ECA11 and ECA28 have been discussed to be probably involved in the determination of withers height. A functional polymorphism on ECA3 was shown to be highly associated with body size and with the relative expression levels of the adjacent gene ligand dependent nuclear receptor corepressor-like (LCORL)[23]. It was assumed that LCORL might be a main regulator for the determination of body size in horses.

The aim of this study was to perform CNV detection analyses in accordance with current standards using three CNV detection algorithms in a large number of horses of various breeds and to compare these results with current microarray studies. The CNVs detected were further analysed for their association with body size as a model for complex traits.

Results and discussion

CNV detection

The detection of CNVs was performed on the data of the Illumina Equine SNP50 beadchip using the algorithms CNVPartition (Illumina), PennCNV [17] and QuantiSNP [24]. Analysis revealed 166, 860 and 1090 CNVs using these programs for the detection (Additional file 1, Additional file 2 and Additional file 3). The mean size for all detected CNVs was 487,562 bp and the median 169,367 bp. Considering the distribution of CNVs over the chromosomes, the detection results of PennCNV revealed the largest number of CNVs on ECA1, ECA12 and ECA13 while the results of QuantiSNP showed an enrichment of CNVs on ECA1, ECA3 and ECA12 (Figure 1). Detection analysis by CNVPartition revealed a high number of detected CNVs on ECA1, ECA12 and ECA23. However, the chromosomes with the highest numbers of CNVs did not necessarily show the highest coverage with CNV regions. We found strong CNV coverage enrichment on ECA23 (CNVPartition), ECA13 (PennCNV), ECA27 and ECA28 (PennCNV and QuantiSNP) and ECA12 (all three programs, Table 1). Across all three detection algorithms, ECA12 was not only significantly enriched by CNVs but also showed the largest number of detected CNVs. An accumulation of CNVs has also been reported in CGH analyses in horses for ECA12, ECA17 and ECA23 and was shown in Illumina Equine SNP50 beadchip based PennCNV analyses for ECA1, ECA2 and ECA17 [4, 5]. We assume that the amount of detected CNVs on specific chromosomes is dependent on the detection method and can vary among different populations. Nevertheless, there is much evidence to presume that ECA12, ECA27 and ECA28 are considerably enriched for CNVs.

Figure 1
figure 1

Chromosomal distribution of CNVs detected by different detection algorithms. (A) Detection results of CNVPartition. (B) Detection results of PennCNV. (C) Detection results of QuantiSNP.

Table 1 Chromosomal enrichment of CNVs detected by three different algorithms

Comparison between three detection programs

Comparative analysis between the algorithms showed that PennCNV and QuantiSNP detected similar numbers of CNVs identified on each autosome. They displayed a CNV detection overlap of 28.4% and 22.8% (Figure 2). The percentage of CNVs overlapping with CNVPartition, the algorithm with the lowest number of detected CNVs, was 32.5% (PennCNV) and 37.4% (QuantiSNP). In total, 50 CNVs could be detected by all three programs (Additional file 4). The average size of these 50 CNVs detected by all three programs was 388,892 bp and ranged from 516 to 978,353 bp (median size 293,244 bp). The number of losses and gains was computed among those breeds that revealed a CNV in all three algorithms (see Additional file 4). On the whole, five CNVs showed higher copy numbers in some and lower copy numbers in other horses while further 28 CNVs only displayed losses and 17 CNVs only gains in these horses.

Figure 2
figure 2

Overlapping CNVs from the three CNV-detection programs used in analysis. 50 CNVs could be detected by all three algorithms.

Comparisons between our detection results of the 50 CNVs detected in all three programs with recent CNV studies [4, 5] showed that these 50 CNV regions were highly similar to those CNV regions in previous studies especially for SNP array analyses (Table 2). Nevertheless, the study by Dupuis et al. revealed a considerably higher number of detected CNVs [4]. We propose that this effect was a result of more stringent quality criteria and the combination of three detection algorithms used in our study.

Table 2 CNV regions from 50 CNVs of three detection algorithms in comparison with recent studies

CGH analyses showed less concordance with our detection events. It was proposed that the results of CGH and SNP arrays were generally not easily comparable [1]. In particular, the CNV-study designs in horses were different, as CGH analyses only targeted exons, while the SNP arrays covered the whole genome. Furthermore, our analysis was performed on a considerably higher number of samples that might allow a more general view on the distribution of CNVs in the horse population. Comparisons between the CNV regions detected by CGH analyses and those identified by the three SNP array detection algorithms individually revealed a relatively high overlap with the detection results of QuantiSNP (25%), less consistency with the PennCNV results (7%) and an extremely low overlap with CNVPartition (3%, Table 3). Despite the relatively low overlap in total, PennCNV showed concordance up to 100% in some chromosomal regions. These comparisons confirm the assumption that the prediction accuracy of CNVPartition seems to be relatively poor due to the high rates of missed events [11]. We suppose that this was the reason for the considerably high number of detected CNVs by individual programs in comparison with the number of overlapping detection events. A closer look at each algorithm showed that PennCNV and QuantiSNP provided a similar distribution of CNVs over the chromosomes and had a considerably higher number of detection events in comparison with CNVPartition. PennCNV and QuantiSNP are both hidden Markov model (HMM) based algorithms that use the log R ratio and B allele frequency independently (QuantiSNP) or in combination (PennCNV) [25]. Comparative analyses of CNV detection methods for SNP arrays confirmed that PennCNV and QuantiSNP had a large overlap of detection events [11]. A study of bladder cancer in human which used a study design similar to ours suggested PennCNV and QuantiSNP to be a more reliable method for the detection of CNVs than CNVPartition [26]. QuantiSNP was even assumed to be the best of these three methods and also outperformed the other algorithms [25].

Table 3 Comparison between 754 CNV regions detected by CGH analyses and the SNP array detection results of CNVPartition, PennCNV and QuantiSNP

CNV sharing among breeds

After filtering out samples with low call rate and quality features we were able to detect CNVs in 717 horses. The total number of detected (identical and not identical) CNVs was 4013 for these horses of different breeds. Due to the choice of unrelated horses, we supposed this number of CNVs gives a good view on the distribution of CNVs in the horse population with the restriction that different numbers of samples were available per breed. Comparative analysis of all three algorithms revealed 536 CNVs that could be detected in all programs for the same breed. These CNVs were found in Arabian, Hanoverian, Holsteiner, Lusitano, Maremanno, Oldenburg, Thoroughbred, Westphalian (Table 4). With regard to the individuals we could confirm 21 CNVs that were detected in the same horse using all algorithms. Theses CNVs ranged in size from 516 to 862,853 bp and showed an average size of 368,720 bp. Further 29 CNVs showed an overlap among different breeds (Table 5).

Table 4 Distribution of detected CNVs among different breeds
Table 5 Number of CNVs derived from the 50 calls of all three detection algorithms that overlap among different breeds

Comparisons between the 50 CNVs derived from all three detection algorithms showed that the largest number of CNVs occurred in the Hanoverians, supposably due to the largest number of samples used in this study. On the whole 18 Hanoverian specific CNVs could be detected. Two of these showed gains and losses in different horses (Additional file 4). In general, only one CNV derived from the comparative analysis of three detection algorithms showed losses and gains among different breeds. This CNV was located on ECA1 at 155.49-155.55 Mb and showed losses in Hanoverian and Lusitano and gains in Holsteiner and Thoroughbred horses.

Furthermore a relatively high number of CNVs shared with the Hanoverian could also be found in Oldenburg, Westphalian and Holsteiner, presumably due to the close relationship among these breeds. A CNV overlap between Arabian, Lusitano and Maremanno horses could be explained by the strong influence of the Arabian and Lusitano bloodlines on the Maremanno breed. In addition, we investigated the distribution of CNVs in two Przewalski horses and detected on the whole 10 CNVs which could not be identified by all three detection algorithms but showed an overlap with different breeds (Additional file 5). We assume that these CNVs might be conserved for a long time and passed on during the domestication of the horse.

Evaluations of the individual horses showed that most horses shared their CNVs with at least another animal according to previous observations [5]. Comparing the average size of CNVs detected in more than one animal (397,708 bp) with the size of CNVs not shared by a second horse (250,773 bp) confirmed the suggestion that CNV sharing among horses is correlated with CNV length as larger CNVs are more likely to be shared [5].

Analysis of genes within CNV regions

Analysis of genomic regions of 50 CNVs, derived from comparative analysis of CNV detection algorithms, revealed 153 different genes within 45 CNVs. In five CNV regions we were not able to find any functional gene. The major category of genes consisted of olfactory receptor (OR) genes (66.7%), a group of genes which is known to be significantly enriched in CNV regions in human, cattle, pig and rat [6, 7, 27, 28]. It was supposed that the overrepresentation of OR genes is not a result of positive selection but of the frequent appearance of these genes in segmentally duplicated regions [27]. These regions are known to be more susceptible to CNVs while genomic regions with dosage-sensitive genes usually show a low number of CNVs [3, 27].

Functional analysis was performed using human orthologs of the horse genes due to the poor annotation of the horse genome. The results of the Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.7 showed an enrichment of 131 genes involved in sensory perception, signal transduction and cellular components (Additional file 6). The highest P-values could be observed for the classifications olfaction (P = 4.20x10-142), olfactory receptor (P = 2.40x10-134) and sensory perception of smell (P = 6.90x10-132). Further analyses with the PANTHER classification system revealed statistically over- and underrepresented biological processes and confirmed the overrepresentation of genes involved in sensory perception (P = 4.53x10-01) and signal transduction as well as response to stimuli (Table 6). Processes involved in cell cycle, transcription and metabolism were underrepresented.

Table 6 Gene ontology analysis of significantly over- and underrepresented genes

Analysis of the molecular functions demonstrated significantly overrepresented processes concerning receptor activities and particularly underrepresented catalytic and hydrolase activities (Additional file 7). The PANTHER protein class analysis showed a comparatively reduced number of nucleic acid binding, transferase and hydrolase proteins. Our results suggest that CNV regions are not randomly distributed among the genes of the horse genome, but they harbour certain genes overrepresented in those regions while other genes involved in specific biological processes are underrepresented. Similar distributions were observed in cattle and in CGH array analyses of 16 horses, showing sensory perception and signal transduction to be the primary processes affected by CNVs [5, 7].

Furthermore, an enrichment of genes involved in metabolic processes was also observed in CGH analyses of horses [5]. Our study revealed an impoverishment of CNVs in regions of genes responsible for a metabolic process. This term is defined by PANTHER as chemical reactions and pathways by which living organisms transform chemical substances. A closer look at our analysis of metabolic pathways shows that primary metabolic processes, protein metabolic processes, nucleobase, nucleoside, nucleotide and nucleic acid metabolic processes were underrepresented while the cyclic nucleotide metabolic process was overrepresented. We assume that metabolic process is a very general term and has to be interpreted in context of more specific ontology terms for metabolic processes. Furthermore, we suppose that the high number of horses used in our study in comparison with previous CGH analysis might give a more general view on the distribution of enriched and impoverished genes in CNV regions. Similar to our results, human CNV studies revealed a significant underrepresentation of genes involved in nucleic acid metabolism in deletion regions [29]. Furthermore, enriched gene categories concerning sensory perceptions of smell, chemical stimuli, smell and taste as well as neurophysiological processes, brain development, immune responses and external biotic stimuli could be shown in human [3, 30, 31]. It was assumed that the underrepresentation of genes occurs due to the strong selective pressure for genes involved in important processes like transcriptional regulation and development [32].

Association analysis with body size

GWAS for body size was performed for each algorithm on basis of its CNV detection results and for the intersection of all three detection programs. A highly genome-wide significant peak on ECA1 could be shown for the data generated by PennCNV (P = 0.006) and QuantiSNP (P = 0.010). Analysis of the intersection of 50 CNVs revealed a P-value at the threshold of significance (P = 0.057) presumably influenced by CNVPartition which did not reveal any significant association. The associated CNVs were deletions in the region of 156 Mb (Table 7). They were located in the area of the candidate genes olfactory receptor 4, subfamily K5 (OR4K5), subfamily K2 (OR4K2), subfamily N2 (OR4N2) and subfamily M1 (OR4M1). GWAS for copy number variation in human revealed the syntenic region of 40.25-40.39 Mb to be significantly associated with stature [20]. It was proposed that individuals with short stature show an excess of lower-frequency deletions [20]. Our analysis revealed the heterozygous deletions on ECA1 to be associated with larger sized horses.

Table 7 Genome-wide associated CNVs for body size in horses

A second CNV region, with the highest association for body size (P = 0.0002), was located on ECA8 at 4.43-4.62 Mb. All these heterozygous deletions could be found in larger sized horses and were absent in smaller horse breeds. The peak region harboured the candidate genes immunoglobulin lambda variable 3–19 (IGLV3-19), variable 3–27 (IGLV3-27) and variable 2–11 (IGLV2-11) which all belong to the immunoglobulin lambda light chain variable gene cluster, which is important for immunoglobulin structure [33].

A third candidate region could be detected by QuantiSNP on ECA9 (P = 0.006) at 29.89 Mb. This CNV showed deletions for 37 larger sized warmblood horses. The neighbouring candidate genes opioid receptor, kappa 1 (OPRK1) and ATPase, H + transporting, lysosomal, 50/57-KD, V1 subunit H (ATP6V1H) have been shown to be associated with body conformation in pigs [34]. Vacular-type H + ATPase proton pump is a complex located in the ruffled border plasma membrane of bone-resorbing osteoclasts and is important for bone resorption. Mutations or deletions in V-ATPase subunits encoding genes have been shown to decrease resorptive activity in bones [3537]. Further candidate genes in the region of the associated CNV are the v-yes-1 yamaguchi sarcoma viral related oncogene homolog (LYN), the trimethylguanosine synthase, S. cervisae, homolog of (TGS1) and the pleiomorphic adenoma gene 1 (PLAG1). Genome scans for sequence variants in human revealed these genes to be in the region of SNPs with the strongest correlation for height in a meta-analysis [38]. The candidate gene PLAG1 is known to be involved in developmental processes [39, 40]. Its influence on bovine stature could be shown by variants modulating the expression of a chromosome domain encompassing PLAG1[39]. A targeted disruption of PLAG1 in mice caused early growth retardation which was maintained throughout adult life [40].

Our results showed that body size in horses is mainly associated with homozygous or heterozygous deleted CNVs on ECA1, ECA8 and ECA9. These regions do not correspond with the region of the potential main regulator LCORL or further associated SNPs or CNVs for body size in horses [5, 2123]. We assume that the associated CNVs detected in our analysis might represent additional regulators in the complex process for the determination of height.

The association of the CNV regions on ECA1 and ECA8 could be confirmed by two detection algorithms, while the peak region on ECA9 was exclusively detected by QuantiSNP. The Illumina-based algorithm CNVPartition did not reveal any association for body size although the associated CNVs on ECA1 and ECA8 could be detected by all three programs. We assume that the generally low number of detected CNVs in only few horses were the reason for the missing association. For this reason we propose that CNVPartition did not allow an appropriate association analysis due to the high false negative rates. This correlates well with our analysis for the detection accuracy of each algorithm. The results demonstrate certain limitations that have to be considered for the use of multiple predictions. It was proposed that analyses of different detection algorithms have to be taken with care due to false negative events but are also useful to avoid false positives caused by noisy data and call attention to discrepancies in the data [11, 15]. PennCNV and QuantiSNP were shown to be more reliable in detecting CNVs than CNVPartition [26]. Our analysis confirmed this suggestion and showed that not only the number of compared CNV detection events but also the choice of the program is important for an effective analysis.

qPCR validation

For validation of detected CNVs we performed qPCR analysis in two CNVs on ECA1 and one CNV on ECA8 for twenty horses each (Additional file 8). We chose three CNV regions that were detected by all three programs of which two were associated with body size and one was also validated in horses of previous analysis [5]. All three CNV regions could be confirmed by qPCR (Figure 3). The rate for the accurate copy number detection of at least one algorithm was 95% for Olfr1284 (ENSECAG00000006791) on ECA1 and 80% for IGLV3-32 (ENSECAG00000005113) on ECA8. For the CNV region on ECA1 at the candidate gene OR4K2 (ENSECAG00000006318) all horses could be validated. The results show that the analysed CNVs could be validated although few horses did not show the copy number detected by SNP chip analysis. Similar analyses in pigs revealed accurate discovery rates of 71% [6].

Figure 3
figure 3

Verification of detected CNVs by qPCR. The horizontal line represents the relative quantification level 1. Two copies are shown by squares, one copy by triangles, zero copies by circles and three copies by rhombus. (A) The CNV region on ECA1 at 155.63 Mb was validated in nineteen horses (solid symbols). One horse showed a homozygous deletion instead of two copies (unfilled symbol). (B) The CNV region on ECA1 at 156.69 Mb was validated in all twenty horses (solid symbols). (C) The CNV region on ECA8 at 4.50 Mb was validated in sixteen horses (solid symbols). Four horses showed different copies as detected by SNP chip analysis (unfilled symbols).

Considering the individual programs, the false negative discovery rate of CNVPartition was particularly high, while QuantiSNP showed the lowest false negative discovery rates in all three CNV regions (Additional file 9). We assume that these results underline our comparative analyses for the detection algorithms showing QuantiSNP to be the most reliable and accurate program.

In addition to this validation, we compared CNVs of previous CGH analyses that were validated by qPCR with our detection results. On the whole, four previously verified CNVs could be confirmed in our analysis. One of these CNVs was rare in our population and could only be detected by QuantiSNP and CNVPartition, further three CNVs could be found in a considerably higher number of horses by QuantiSNP and also by PennCNV in contrast to CNVPartition (Additional file 10). This comparison helps to verify our CNV detection results in these regions and confirms the assumption that there are significant differences in the detection abilities of the analysed algorithms. We conclude that a validation of detected CNVs is crucial especially for further analyses of associated regions, and helps to evaluate the accuracy of the detection results by specific programs.

Conclusions

The aim of our analysis was to investigate three different algorithms for the detection of CNVs in horses and to find associated CNVs for the regulation of body size. Comparative analysis of the detection programs in 717 horses revealed 50 CNVs with an average size of 388,892 bp. Functional analysis of genes located in these CNVs confirmed the high amount of OR genes (66.7%) and showed an overrepresentation of genes involved in sensory perception, signal transduction and cellular components while cell cycle, transcription and metabolic processes were underrepresented.

We conclude that in general the creation of an intersection of three CNV-programs is useful to increase the accuracy of CNV detection and to reduce the number of false positive results. Nevertheless, the comparison between these programs also provides a strong restriction and a higher number of false negative results which is highly dependent on the choice of the detection tool. We recommend taking advantage of the different algorithms used in detection programs and to perform multiple predictions in a first step. Such an analysis will show possible limitations and establish suitable algorithms for further evaluations. In our study the combined use of PennCNV and QuantiSNP was most effective for the accurate detection of CNVs. GWAS for the CNV detection results of these algorithms identified for body size three homozygous or heterozygous deleted CNV regions associated with larger sized horses on ECA1, ECA8 and ECA9. Two of these regions were analysed by qPCR and could be validated. We conclude that a comparative CNV analysis is a useful approach as it reveals the limitations of individual programs and helps to estimate the reliability of the detection results.

Methods

Ethic statement

All animal work has been conducted according to the national and international guidelines for animal welfare. The sampling was approved by the Lower Saxony state veterinary office Niedersächsisches Landesamt für Verbraucherschutz und Lebensmittelsicherheit, Oldenburg, Germany (registration numbers 509c-42502-01A60, 02A-138 and 07A-482).

Sample preparation

EDTA-blood samples of 17 different breeds were collected including 148 Arabian, one Anglo-Arabian, two Brandenburger, 514 Hanoverian, nine Holsteiner, 13 Oldenburg, five Trakehner, 35 Westphalian, one Selle Francais, one German Riding Pony, 47 Lusitano, 48 Maremanno, two Przewalski, 12 Rhinelander horse, two Rhenish-German Cold-Blood, 14 Thoroughbred and one Zweibrücker. The horses were chosen out of different breeding lines to provide unrelated horses as far as possible. For genotyping we isolated genomic DNA using standard methods with RBC (Red Blood Cell) lysis buffer and SE (sodium EDTA) buffer. The DNA concentration of the samples was adjusted to 50 ng/μl using the Nanodrop ND-1000 (Peqlab Biotechnology, Erlangen, Germany) and quality control was performed by gel electrophoresis using 1% agarose gels (peqGold Universal Agarose, Peqlab Biotechnologie, Erlangen, Germany). All 854 animals were genotyped using the Illumina equine SNP50 BeadChip (Illumina, San Diego, USA) including 54,602 SNPs.

Data analysis

To provide reliable results, quality control was performed in a first step by the choice of DNA samples with high qualities and the determination of a call rate >90% for animals and SNPs. Further criteria for the exclusion of noisy data were chosen individually for each algorithm. Primary visualisation and quality control of the CNV data was performed with the GenomeStudio software (Illumina). Data files containing SNP name, chromosome, position, B-allele frequency and log R ratio were exported as a unique file and separated using the split option in PennCNV package. Analysis of the autosomal regions for CNV was done in three different programs. PennCNV was run according to default criteria (Illumina) [17] using the command line detect_cnv.pl –hmm example.hmm –pfb horse909Tiere.pfb –minsnp 3 –lastchr 31 –test –tabout –coord –conf –listfile list.txt –out penncnv1.txt and probe counts less than three were omitted afterwards. Quality control was performed employing standard exclusions of the Log Ratio (SD (LRR)) >0.3 and the GC-content caused fluctuation of signal intensity (│GCWF│) >0.02. An individual-call mode was used for all samples. Then we used QuantiSNP 2.0, a program based on an Objective Bayes Hidden-Markov Model (HMM) [24], for CNV analysis. The detection was performed by the command: quantisnp2 –outdir horse –logfile 909Pferde –chr [1:31] –chrx 32 –beadstudio-files 909Pferde.txt. After CNV detection the minimum probe count of three, a threshold for the Log Bayes Factor of less than ten was set. The GenomeStudio plug-in cnvPartition 3.1.6 was run by default criteria including a confidence threshold of 35, a minimum homozygous region size of 1,000,000 a minimum probe count of 3 and GcWaveAdjustLRR.

GWAS for the CNV detection results of each individual programme and the intersection of PennCNV, QuantiSNP and cnvPartition was performed using PLINK version 1.07 (http://pngu.mgh.harvard.edu/purcell/plink/) [41] and SAS/Genetics, version 9.3 (2013) in a quantitative trait analysis. A mixed linear model (MLM) was employed to control data for stratification. Size ranges for every breed were estimated and results were averaged as it was shown in our previous analysis for body size (Table 4) [23]. The chromosomal enrichment was also accounted with SAS/Genetics for each algorithm similar to the enrichment analysis in the CGH study [5] by merging overlapping CNVs on basis of the SNP map of 48860 SNPs to CNV regions. The number of SNPs covered by CNV regions was divided by the length of the chromosome. The whole-genome coverage of CNV regions was used as standard after dividing by the total length of the autosomes. Significant chromosomal enrichment occurred when the enrichment of the genome was lower than the enrichment of the chromosome.

Ontology analysis

Genes located in the CNV regions of the intersection of all three programs were identified using NCBI MapViewer http://ftp.ncbi.nih.gov/genomes/MapView/ Equus_caballus. Due to the insufficient annotation of the horse genome, we determined the human orthologs for these genes by NCBI gene http://www.ncbi.nlm.nih.gov/gene and HomoloGene http://www.ncbi.nlm.nih.gov/homologene to perform ontology analysis. The Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.7 (http://david.abcc.ncifcrf.gov/home.jsp) was used for functional annotation of genes in CNV regions [42, 43]. The functional annotation chart was run with a count threshold of 2, an EASE Score threshold of 0.1 and the Bonferoni correction. Results were grouped in functional groups. Further ontology analysis was performed using the PANTHER (Protein ANalysis THrough Evolutionary Relationships, version 8.0) classification system http://www.pantherdb.org/ for the classification of genes by their molecular function, biological process and cellular component with default Bonferoni correction [44].

qPCR validation

Validation of three different genomic regions harbouring CNVs was performed by quantitative real-time (qRT)-PCR on the ABI7300 sequence detection system (Applied Biosystems by Life Technologies, Darmstadt, Germany). On the whole eighteen warmblood horses and two Arabians were analysed. DNA was isolated from blood samples in the same way as it was done for SNP chip analysis. The DNA concentration of the samples was adjusted to 10 ng/μl using the Nanodrop ND-1000 (Peqlab Biotechnology).

Primer pairs and probes were designed using Primer express 3.0 (Applied Biosystems) and Primer3 (http://frodo.wi.mit.edu/primer3/). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) on ECA6 was used as reference gene according to previous CNV analyses in horses [5]. Reactions were assembled in a final volume of 12 μl containing 2.0 μl gDNA, 1.0 μl reverse and 1.0 μl forward primes (10 pmol; Additional file 6), 0.25 μl VIC-labeled TaqMan probe for CNV regions (6 nmol; 100 μM) and FAM-labeled TaqMan probe for GAPDH, 1.68 μl nuclease free water and 6 μl Maxima Probe qPCR master mix 2x supplemented by 0.07 μl ROX Solution (50 μM; Fermentas Life Sciences, St. Leon-Rot, Germany). Quantitative real-time (qRT)-PCR was performed for 10 minutes (min) at 95°C followed by 40 cycles at 95°C for 15 seconds and 60°C for 1 min. Analysis was done for every sample two times and the average value was used for further calculations. One sample without copy number variations in the analysed regions was used as reference. For the relative quantification of CNVs we used the 2-ΔΔCt method [45]. The ΔCT values of the test samples were subtracted from the ΔCT of the reference sample and put in the formula 2-ΔΔCt.