Runs of homozygosity and population history in cattle
- First Online:
- Cite this article as:
- Purfield, D.C., Berry, D.P., McParland, S. et al. BMC Genet (2012) 13: 70. doi:10.1186/1471-2156-13-70
- 9k Downloads
Runs of homozygosity (ROH) are contiguous lengths of homozygous genotypes that are present in an individual due to parents transmitting identical haplotypes to their offspring. The extent and frequency of ROHs may inform on the ancestry of an individual and its population. Here we use high density (n = 777,962) bi-allelic SNPs in a range of cattle breed samples to correlate ROH with the pedigree-based inbreeding coefficients and to validate subsequent analyses using 54,001 SNP genotypes. This study provides a first testing of the inference drawn from ROH through comparison with estimates of inbreeding from calculations based on the detailed pedigree data available for several breeds.
All animals genotyped on the HD panel displayed at least one ROH that was between 1–5 Mb in length with certain regions of the genome more likely to be involved in a ROH than others. Strong correlations (r = 0.75, p < 0.0001) existed between the pedigree-based inbreeding coefficient and a statistic based on sum of ROH of length > 0.5 KB and suggests that in the absence of an animal’s pedigree data, the extent of a genome under ROH may be used to infer aspects of recent population history even from relatively few samples.
Our findings suggest that ROH are frequent across all breeds but differing patterns of ROH length and burden illustrate variations in breed origins and recent management.
KeywordsRuns of homozygosityInbreedingCattle population history
Runs of homozygosity (ROH) are contiguous lengths of homozygous genotypes that are present in an animal due to parents transmitting identical haplotypes to their offspring. The extent and frequency of these may inform on the ancestry of an individual and its population. Particularly, consanguinity may be indicated from the presence of long ROH; the longer such segments are, the more likely that recent inbreeding occurred within a pedigree . However, unusually long runs of homozygosity may also persist in out bred individuals, perhaps due to unusual mutation, linkage disequilibrium (LD), and recombination rates at certain genomic locations .
The distribution of shorter ROH may also inform on the presence of more ancient relatedness which is unaccounted for in an individual’s recorded pedigree due to the limitations in the recording process . In the context of domestic animals these may result from breed or population founder effects or other restrictions.
The domestication of cattle, which occurred ~10,000 years ago, was a complex process, with evidence suggesting that it occurred in a minimum of two domestication events . However, both natural and artificial selection of cattle, as well as regional variations due to drift has resulted in breeds that differ extensively in phenotypes. These processes and the extent of breeding control have differed greatly among populations and ROH may provide useful information on these disparate histories. Particularly, in recent times the practices of intense selection of sires, artificial insemination, and embryo transfer have featured heavily in some breeds, reducing effective population sizes, genetic diversity and affecting levels of homozygosity.
Runs of homozygosity have been extensively studied in human populations and are an established method of distinguishing a population history of consanguinity, and with homozygosity mapping analysis showing a relationship with susceptibility to recessive diseases [1–3, 5, 6]. Here we use high density (n = 777,962) bi-allelic SNP data in a range of cattle breeds to correlate ROH with the pedigree-based inbreeding coefficient and to validate further analysis using 54,001 SNP genotypes. This allows examination and interpretation of the level of ROH that exist in a wide range of cattle breeds samples.
Genotypes and quality control
Single nucleotide polymorphisms (SNPs) genotypes consisting of 777,972 bialleleic SNPs from the BovineHD BeadChip (Illumina Inc., San Diego, CA) were generated for 891 artificial insemination sires of multiple breeds. Breeds represented included Angus (n = 39), Belgian Blue (n = 38), Charolais (n = 117), Friesian (n = 98), Hereford (n = 40), Holstein (n = 262), Holstein-Friesian crosses (n = 111), Limousin (n = 128) and Simmental (n = 58). Additionally, the 48,734 SNPs common to both the HD and the Illumina BovineSNP50 Beadchip were retained in a reduced HD panel which was used to evaluate the lesser circa 50 k SNP density for identifying ROH; hereafter referred to as the reduced HD panel . In addition, three published genotype datasets at this density comprising of a total of 1166 animals from 42 different breeds (detailed in Additional file 1.) generated from the Illumina Bovine50 Beadchip, hereafter known as the SNP50 panel, were also available [7–9].
Number of single nucleotide polymorphisms for each data edit implemented on each density panel
Reduced HD Panel
Initial data set
SNP and animals with >90 % Call Rate
Hardy Weinburg Equilibrium in each breed (p < 0.0001)
Monomorphic SNPs removed
Definition of a run of homozygosity
where ns is the number of SNPs per individual, ni is the number of individuals, α is the percentage of false positive ROH (set to 0.05 in the present study), is the mean SNP heterozygosity across all SNPs. For the exclusion of very short and common ROH that occur prevalently throughout the genome due to LD, a minimum ROH length of 500 kb was set.
For analysis of the HD panel genotypes, the minimum SNP density was 1 SNP every 50 kb to ensure low SNP density did not falsify ROH length, a minimum run length of 58 SNPs was needed to produce <5% randomly generated ROH and the maximum gap between two consecutive homozygous SNPs in a run was set at a 100 kb. In the analysis of the reduced HD panel and the Bovine SNP50 panel, the minimum SNP density was altered to 1 SNP every 120 kb and no restriction was placed on the minimum number of SNPs in a ROH and the maximum gap length between two consecutive homozygous SNPs in a run was kept at the default value of 1000 kb to account for the lesser genotype density.
Runs of homozygosity were identified for each animal separately on the HD and reduced panel and for comparative purposes only, the maximum gap length in the HD panel for ROH identification was altered to 1000 kb, SNP density to 1 SNP every 120 kb and there was no restriction on the minimum number of SNPs that constituted a ROH, in order to limit bias. To establish that the reduced density also predicts the correct ROH length category for ROH, the extent that reduced panel ROH were correctly assigned to the HD ROH length category was plotted.
Animals with overlapping ROH, and those ROH that were an allelic match, were also identified in the HD panel. The identification of overlapping regions was done by using the sliding window approach as mentioned above, and then for each SNP by calculating the proportion of homozygous windows in the population dataset that overlap that same position. The percentage of animals that had the region with the most overlapping ROH on each chromosome was plotted and the percentage of these overlapping ROH that were an allelic match ≥ 95 % were identified.
The percentage population of each breed with ROH present at different ROH length categories was calculated, as well as the mean overall ROH sum per animal for each breed. The mean sum of ROH within each ROH length category was also calculated by summing all ROH per animal in each ROH length category and averaging this per breed population. The percentage of SNP involvement in ROH was also calculated by counting the amount of times a SNP appeared in a ROH in the population dataset.
Inbreeding coefficient vs. runs of homozygosity
in which LROH is the sum of ROH per animal above a certain criterion length and LAUTO is the total length of autosome covered by SNPs. LROH was calculated separately as the sum of all ROH >500 kb and the sum of all ROH >10,000 kb where SNP autosomal genome coverage was 2,510,611 kb in the HD panel and 2,500,265 kb in the reduced HD panel. Pedigree based inbreeding coefficients for all animals were calculated using the Meuwissen and Luo  algorithm. Depth of pedigree known was measured in complete generation equivalents (CGE) for all animals as described in  and correlations between all measures of inbreeding were calculated only on animals (n = 230) with a CGE value ≥6.
ROH in animals with HD panel genotypes
Genome locations of ROH
Correlation between ROH and inbreeding coefficient
Validation of Bovine SNP50 density
ROH in Bovine SNP50 genotypes across the 42 breeds
Our findings show that ROH are frequent across all breeds and that certain ROH length categories can be used as an indication of consanguinity. They can also inform on breed population history as the effects of population bottlenecks, selection pressure and breeding management on the bovine genome may potentially leave an imprint on ROH length.
The bovine HD SNP assay allows an analysis of ROH at similar density to that employed to generate genomic signatures of endogamy that differ markedly among human populations [1–3]. Moreover, cattle allow a first comprehensive testing of the inference drawn from ROH through comparison with estimates of inbreeding from calculations based on the detailed pedigree data available for many breeds. The strong correlation between the pedigree inbreeding coefficient and sum of ROH of length > 0.5 kb suggests that, in the absence of an animal’s pedigree data, the extent of a genome under ROH may be used to infer aspects of recent population history even from relatively few samples, as previously suggested by McQuillan et al. (2008). However, 44% of the variance in ROH distribution remains unexplained by pedigree inbreeding and may partly reflect the limitations of ancestry recording in cattle where founder animals are generally, and often inaccurately, assumed to be unrelated (McParland et al., 2007). Additionally, the propensity for multiple megabase scale ancestral haplotypes in certain genome regions to persist even in outbred animals, perhaps due to localised low levels of recombination and high levels of LD may contribute . Lastly, we note that pedigree relatedness gives an expected, not actual, proportion of genomic identity by descent among individuals and it might be anticipated that genotype-based estimates provide greater accuracy on relatedness .
Whereas HD SNP panels facilitate more accurate detection of ROH, the vast majority of cattle SNP genotype data, and emerging data in other livestock, is available at ~50,000 SNP density. It is therefore of interest as to whether these sparser genotypic data can reliably inform on ROH and inbreeding history. We found that HD ROH were not accurately identified in such a reduced panel if between 0.5-1 Mb in length. However, the SNP50 density genotypes were sufficient to recognise almost all ROH >5 Mb but they also has the potential to inflate ROH length. Importantly, ROH levels at the lower SNP density correlate equally well with the pedigree estimates of inbreeding. We conclude that this prevalent marker density is appropriate in identifying ROH.
We used three published [7–9] SNP50 genotype collections to examine patterns in ROH distribution and compare aspects of population history among a range of cattle breeds. The domestication process itself featured a limited sampling from the wild with a more recent bottleneck detectable 50–100 generations ago, presumably corresponding to breed formation . However, this traditional breed formation is largely a European phenomenon and its absence is most apparent in the data from African cattle. These samples, including B. taurus breeds, humped B. Indicus breeds and indicine/taurine hybrids, tended toward low levels of ROH per genome, reflecting traditional management practices in Africa, characterised by less controlled mating .
An open village breeding system may also predispose to random consanguineous matings and many African breeds show outlying highly inbred individuals (Figure 5). The length distribution of ROH can help to distinguish different types of parental relatedness. Samples from human populations where cousin marriage is common show an excess of long ROH; whereas for example, Papuan and Melanesian human populations show an excess of shorter ROH, consistent with effects of reduced population and isolation rather than first degree relative unions (Kirin et al. 2010). Figure 6 compares the contributions of long (> 20 Mb) and short ROH to breed homozygosity in order to differentiate the effects of ancient and more recent relatedness among ancestors. Here, the three African taurine breeds (Oulmes zaer, Somba and Lagune) with higher homozygosity clearly show a strong influence of ROH of length greater than 20 Mb and hence of recent inbreeding. We note that Gautier et al. (2009) reported a high FIS value, as well as extensive linkage disequilibrium within the Lagune breed.
Zebu-taurus hybridisation is also a dynamic and contemporary process within Africa [18–20]. This acts to increase genetic diversity and contributes to the interruption of stretches of homozygous genotypes within individuals. The effects of this process are evident in three hybrid breeds which show the lowest extent of ROH in African breeds. These include the Kuri; where previous work has shown a near 50:50 genetic admixture between surrounding zebu and themselves , and the Sheko breed, where the original taurine African Y chromosome is in danger of disappearing from this breed due to the use of zebu bulls .
Within European breeds, British breeds tended toward higher quantities of ROH, reflecting results of previous microsatellite research, where British Isles breeds had lower levels of observed heterozygosities and gene diversities in comparison to other Mediterranean and Northern European breeds analysed . The Channel Island breeds showed strong influence from long ROH reflecting their unusually closed population histories due to strict importation restrictions on both the Jersey and Guernsey Islands implemented during the 1800 s .
The zebu breeds represented in this study have contrasting histories. The mainland African zebu breeds (Bororo and Fulani) which are products of ancient introductions from South Asia and are all hybrids to some extent, had much lower quantities of average ROH in comparison to the American zebu breeds analysed (Figure 6). Within the American and Madagascan zebu populations a stronger homozygosity signal, with a weighting toward smaller length ROH, suggests that these breeds were initially established by small founding populations but were not particularly affected by recent inbreeding (Figure 6). The initial introduction of the now prolific zebu animals in the Americas featured very limited numbers during the 19th and 20th century and Madagascan zebu were founded by ancient importations from Asia and East Africa which were probably limited in scope due to the isolation of the island [20, 24].
The ascertainment bias  towards European Bos taurus breeds that is associated with the Bovine SNP50 genotyping chip, does not seem to invalidate the trends in ROH levels observed here, as ROH levels were in fact higher in those breeds with a higher number of polymorphic SNPs, as validated by Illumina ( Additional file 7). Also, the existence of long ROH (>20 Mb) for example in many of the less polymorphic African village breeds (Oulmes zaer, Somba and Lagune) are unlikely to be artefactual due to the vanishingly small probability of long contiguous homozygous SNPs occurring by chance. However, some bias may exist in the Bos Indicus ROH levels, as an over estimation of ROH amounts is possible due to low amount of polymorphic markers found in these breeds due to the design of the genotyping chip , as a result some caution must be taken when inferring ROH levels within these breeds. The bovine HD genotyping chip was designed from a more comprehensive range of breeds comprising several temperate and tropically adapted Bos taurus, Bos indicus and hybrid breeds and thus does not exhibit the same level of ascertainment bias .
The Hapmap population data also allow comparison with an alternative inference of past population size. Linkage disequilibrium may be used to infer past population size where higher r2 indicates lower effective population size with LD at longer genetic distances corresponding to younger time depths [26, 27]. Interestingly, the Hapmap breed samples analysed here show a strikingly similar ranking in LD at distances of >200 kb to that which they show in average ROH .
Analyses of human ROH have previously established a correlation between extensive LD, locally low rates of recombination and high incidence of homozygous runs . Intensive selection intensity in cattle has possibly acted to maintain long lengths of homozygous tracts. Previous work carried out in over 500 animals from 8 breeds noted that high levels of LD, particularly in the Holstein breed, existed on chromosomes 14 and 16, the two chromosomes with highest proportions of ROH in our study . Conversely, chromosome 12 was found to have higher than average recombination rates and lower levels of LD (r2 <0.2) than the majority of chromosomes  and, interestingly, showed the highest proportion of SNPs uninvolved in a ROH within our sample population. The existence of recombination hotspots throughout the genome also can impact ROH, with multiple genomic regions that remained uninvolved in any ROH such as those on chromosomes 12 and 23 found to be well documented human and cattle recombination hotspots [29–31].
The existence of QTL in ROH have been well documented in human studies [5, 6, 32]. Here, several of the highly involved genomic regions located on chromosomes 7, 14, 16 and 18 (Figure 2) all potentially contain genes of importance in cattle with associations ranging from immunity through to carcass and dystocia related traits [33–35] when explored using three QTL databases available online (http://genomes.sapac.edu.au/bovineqtl/index.html, ahttp://www.animalgenome.org/QTLdb/cattle.html, http://www.ncbi.nlm.nih.gov/). In particular, Chromosomes 9 and 5, which had the highest amount of long ROHs (>20 Mb), are well documented to contain QTL pertaining to milk fat yield and weight related traits respectively [36–39].
ROH analyses quantifies a feature of genomic variation that may be used in inference of population history and to associate with important production and disease traits and perhaps signatures of selection. We show that ROH analysis in cattle provides a sufficient predictor of the pedigree inbreeding coefficient and the prevalent SNP50 genotyping array may be a sufficient tool to predict ROH in the genome. Patterns of ROH may be decomposed to highlight the effects of recent and more ancient ancestral relatedness and match known aspects of breed history in a sample set of wide provenance.
This work was supported by Science Foundation Ireland principal investigator award grant number 09/IN.1/B2642. We are grateful to those researchers who have made SNP50 genotypes used in this work publicly available. We are also grateful to the Irish Cattle Breeding Federation for access to the pedigree data.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.