Probe-based association analysis identifies several deletions associated with average daily gain in beef cattle
- 167 Downloads
Average daily gain (ADG) is an important trait that contributes to the production efficiency and economic benefits in the beef cattle industry. The molecular mechanisms of ADG have not yet been fully explored because most recent association studies for ADG are based on SNPs or haplotypes. We reported a systematic CNV discovery and association analysis for ADG in Chinese Simmental beef cattle.
Our study identified 4912 nonredundant CNVRs with a total length of ~ 248.7 Mb, corresponding to ~ 8.9% of the cattle genome. Using probe-based CNV association, we identified 24 and 12 significant SNP probes within five deletions and two duplications for ADG, respectively. Among them, we found one common deletion with 89 kb imbedded in LHFPL Tetraspan Subfamily Member 6 (LHFPL6) at 22.9 Mb on BTA12, which has high frequency (12.9%) dispersing across population. CNV selection test using VST statistic suggested this common deletion may be under positive selection in Chinese Simmental cattle. Moreover, this deletion was not overlapped with any candidate SNP for ADG compared with previous SNPs-based association studies, suggesting its important role for ADG. In addition, we identified one rare deletion near gene Growth Factor Receptor-bound Protein 10 (GRB10) at 5.1 Mb on BTA4 for ADG using both probe-based association and region-based approaches.
Our results provided some valuable insights to elucidate the genetic basis of ADG in beef cattle, and these findings offer an alternative perspective to understand the genetic mechanism of complex traits in terms of copy number variations in farm animals.
KeywordsCopy number variation Average daily gain Probe-based association Positive selection Beef cattle
Average daily gain
Copy number variation region
Frequency of allele B
Bos Taurus autosomes
Copy number variation
Genome-wide association study
Integrative genomics viewer
Log R ratio
Quantitative trait locus
Single nucleotide polymorphism
Genomic structural variants mainly comprised of copy number variations (CNVs) in the form of large-scale insertions and deletions, as well as inversions and translocations . CNVs involve more genomic sequence as compared to nucleotide polymorphisms (SNPs), thus they have potentially larger effects, including alternating gene regulation and dosage, contributing to gene expression and risk for normal phenotypic variability [2, 3, 4, 5].
High-throughput SNP genotyping arrays have been widely used in genome-wide studies. While these arrays have limited capacity to assess the effects of rare single-site variants, they can be readily used to identify large copy number variations, even if they occur in only a few subjects . There are tremendous evidences showing that other genetic variants like copy number variations may affect complex traits, including short stature and anthropometric traits in human [7, 8]. For instance, one recent study suggested that a 45 kb deletion was associated with the body mass index in humans, which also reflects neuronal influence of the deletion on body weight regulation . Previous study identified several genes (e.g., MC4R, FIBIN, and FMO5), harboring both common and rare variants which may affect body size and anthropometric traits using a CNV-association analysis in European adults .
Considerable attention has turned towards assessing the association between copy number variations and complex traits in farm animals using high-throughput array. In cattle, several studies have found CNVs are likely to be associated with resistance to gastrointestinal nematodes in Angus [10, 11] and residual feed intake, milk production and fertility traits in Holstein cows [12, 13, 14]. Also, a recent study described a 660 kb deletion which has antagonistic effects on fertility and milk production in Nordic Red cattle . Thus, detecting CNVs and identifying their potential associations have gradually become an alternative method to comprehensively elucidate the genetic mechanism of complex traits in farm animals.
Average daily gain (ADG) is generally recognized as an economically important growth trait that contributes to the production benefits in the beef industry. Previous studies have identified many QTL regions associated with ADG in various populations [16, 17, 18, 19, 20, 21, 22, 23, 24, 25], these studies had utilized multiple methods including SNP-based GWAS, haplotype-based GWAS and gene-based GWAS to test the association for ADG in various populations. However, the molecular mechanism of ADG have not yet been fully explored, partially because most recent studies of ADG are based on SNPs or haplotype alone, and systematic association study for this complex trait based on CNVs is still missing.
In this study, we presented a comprehensive CNV association analysis for ADG in Chinese Simmental beef cattle. Seven CNVs were identified significantly associated with ADG using probed-base association analysis. Notably, we found one common deletion with 89 kb imbedded in LHFPL6 with high frequency and one rare deletion overlapped with GRB10 as potential candidate variants for ADG in Chinese Simmental cattle. Further systematic studies indicated the identified common deletion may contribute additional effect to ADG beyond SNPs.
We performed CNV analysis with the Illumina Bovine HD BeadChip in Chinese Simmental beef cattle. A total of 234,973 raw CNV events were generated using PennCNV v1.0.4  based on the UMD3.1 genome assembly. After quality control, 61,710 of them in 1079 individuals that met quality thresholds were kept for subsequent analyses. On average, 57.2 CNV events were obtained for each individual, with average length of 3.6 Mb (Additional file 1). These CNVs were merged into 4912 nonredundant copy number variation regions (CNVRs) with a total length of ~ 248.7 Mb, corresponding to ~ 8.9% of the cattle genome.
Enrichment analysis using CNV-disrupting genes
We further investigated the gene-disrupting CNVs using the DAVID (The Database for Annotation, Visualization and Integrated Discovery) system to check enrichment for these genes. Duplication and deletion were considered separately in current study. We obtained 1863 and 629 genes overlapped with deletion and duplication regions, respectively (Additional file 2). Using DAVID annotation platform, for deletions we found that a significant over-representation of genes related to antigen processing and presentation of peptide or polysaccharide antigen via MHC class II and MHC class II protein complex, while for duplications we found that several genes were enriched in MHC class I protein complex, antigen processing and presentation of peptide antigen via MHC class I, immune response, antigen processing and presentation of peptide or polysaccharide antigen via MHC class II and MHC class II protein complex (Additional file 3).
CNVs overlap with QTL associated with ADG trait
We next explored the overlap of QTLs on CNV regions (at least 1 bp overlap between them). We retrieved autosomal QTL regions from QTLdb associated with the trait classes ‘Average daily gain’. We found that 356 deletion and 135 duplication regions overlapped with the merged QTL regions for ADG. Among them, deletion regions occupy ~ 14.13 Mb, while duplication regions occupy ~ 4.08 Mb (Additional file 4). These findings imply these CNVs is likely to be used as new potential candidate markers to refine cattle QTLs after validation.
ADG associated CNVs
Candidate copy number variation regions associated with average dairy gain for beef cattle
Count of significant probes
Beside deletions, we also identified two candidate duplications for ADG. However, no gene was found within these duplication regions. In addition, we found one duplication with 125 kb displaying a frequency of 0.74% in our population. One significant probe in duplication located at the upstream of R3HDM2, but only one significant probe was detected for this duplication.
In addition, to ensure reliability of our CNV detection method, we randomly selected seven identified CNVs representing different types for quantitative PCR (qPCR), and examine eight samples which contain each of seven CNVs. Two distinct pairs of primers were designed using Primer 3.0 for each detected CNV (Additional file 6). Our analysis showed that the validation rates of the eight samples varied from 71.43 to 100% with an average of 85.71%, which were comparable to our earlier results and other studies [29, 30, 31, 32, 33].
Selection estimation and sequencing validation for one common deletion
Genome wide association studies have remarkably advanced our understanding of the genetic basis of complex traits. However, these strategies cannot fully evaluate the overall heritability as other genomic variants may contribute effect for these traits , thus elucidation of genetic mechanism of CNV for complex traits still needs to be further investigated [36, 37].
Despite the improvements in genotyping platforms and statistic approaches have facilitated the discovery of CNVs, integrating CNVs analysis into GWAS for complex traits remains challenging. Although it is possible that CNVs are in linkage disequilibrium (LD) with associated variants, the identification of causal variants may still require us to consider CNVs beside SNPs. Previous studies of CNV association for complex trait in farm animal are mostly done using common CNVs detected by a multivariate analysis [11, 12, 38]. These approaches utilize the copy number analysis module under the multivariate option, and thus, facilitate the identification of common CNV segments. However, the CNAM algorithm force the CNV boundaries within a fixed window, which may cause CNV boundary enforcement artifacts. Compared to CNAM method, probe based association implemented in ParseCNV was developed to facilitate data processing and improve transparency for CNV association studies . ParseCNV converts the individual level CNV calls into population level probe-based CNV states, thus this process can facilitate variable construction for association test based on CNV.
To systematically search for CNVs that contribute genetic architectures of ADG, we conducted a genome-wide association study based on CNVs using Illumina Bovine 770 K BeadChip in Chinese Simmental cattle. Our previous studies identified 263 CNV regions (CNVRs), which covering 35.48 Mb (1.41%) of the cattle genome in ~ 700 individuals . In present study, we found 248.7 Mb, corresponding to 8.9% genome. This probably is due to larger sample size was used for CNV discovery in our populations. Large population can facilitate the application of CNV-based GWAS analysis and help to improve the detection of potentially associated CNV for ADG. In addition, PCR-based validation results showed around 86% of the validation based on qPCR were consistent with the PennCNV predictions. Also, CNV annotation indicates several genes with significant over-representation were related to receptor activity, immune and antigen processing, which are consistent with previous CNV analyses in cattle and other mammals [30, 39, 40, 41, 42]. Totally, using probe-based CNV association analysis, we identified 38 significant probes and 7 corresponding CNV regions associated with ADG. This finding, for the first time, reported the associated CNVs contributing to ADG in farm animals. Our previous study has identified 40 significant SNPs and 7 prominent genes for ADG using multi-strategy GWAS in Chinese Simmental beef cattle . Additionally, no SNPs, genes and regions in this SNP-based GWAS was found overlapped with the identified CNVs in the current study. Thus, the CNV deletions discovered in present study might contribute to ADG alone.
Totally, we have identified several candidate genes (e.g. LHFPL6, SORCS3, GRB10, CENPW, ESR1 and ATG14) within or near candidate CNVs for ADG. Among them, we found one common deletion imbedded in LHFPL6 at 22.9 Mb on BTA12 with high frequency in Chinese Simmental population. This gene belongs to a member of the lipoma HMGIC fusion partner (LHFP) gene family, which was reported that fused to a high-mobility group gene in a translocation-associated lipoma. Mutations in LHFP-like gene was found that related to the deafness in mice and humans [43, 44]. Moreover, we suspected the high frequency deletion occurred under positive selection and may play an important role to affect complex traits. Also, our VST statistic results suggested this deletion display significant association with ADG in Chinese Simmental cattle compared to native cattle. Therefore, this CNV may potentially act as important genome variant under selection contributing to ADG.
In addition, we identified one rare deletion near GRB10 located at 5.1 Mb on BTA4 using both probe-based and region based association analyses. GRB10, growth factor receptor-bound protein 10 gene, is an intracellular adaptor protein that acts as a negative regulator of insulin and insulin-like growth factor receptors to restrict fetal and placental growth during mammalian development [45, 46]. This gene have been identified as candidate imprinted gene associated with growth-related trait in Irish Holstein-Friesian cattle [47, 48]. GRB10 has also been reported to be related to the development of fiber number in skeletal muscle  and milk tridecylic acid . However, the functional study of these identified deletions still need more efforts to be further explored with third generation sequencing and other experimental validations. Our analyses provided some valuable insights into the understanding the missing heritability of ADG. To our knowledge, the present study provides the first case of association between CNVs and quantitative trait in Chinese Simental beef cattle. These results extend our understanding of CNV in complex trait and pinpoint to the importance of utilizing new methods that allow for considering these variations in genome-wide association . Further functional study and expression assays can be utilized to assess the biological effects of CNVs in candidate genes and help to understand their contribution to complex traits in farm animals.
Our study identified 24 and 12 significant SNP probes within four deletions and three duplications for ADG, respectively. Among them, we found one common CNV deletion with 89 kb imbedded in LHFPL6 at 22.9 Mb on BTA12, this deletion was not overlapped with any candidate SNP for ADG compared with previous SNPs-based association studies, suggesting its important role for ADG. In addition, we identified one rare deletion near GRB10 at 5.1 Mb on BTA4 for ADG using both probe-based association and region-based approaches. Our results provided some valuable insights to elucidate the genetic basis of ADG in beef cattle, these findings offer an alternative perspective to understand the genetic mechanism of complex traits in terms of copy number variations in farm animals.
No ethics statement was required for the collection of genetic material. The data from animals included in this study were derived from previous analyses that obtained specific permissions .
Samples and phenotype data
Samples were genotyped using Illumina Bovine HD SNPs array. A more detailed description of the original array data set can be found in our previous publication . The resource population consisted of 1173 Simmental cattle that were born between 2008 and 2013 in Ulgai, Inner Mongolia. After weaning, all calves were transferred to a fattening farm in Beijing and fattened in the same pens for 8~12 months. All animals were fed with same feeding and management conditions, and ADG was estimated during the fattening period. Test distribution of ADG trait showed it follow a normal distribution and analysis of variance (ANOVA) showed that farm, sex, year of measurement, fattening days had significant effects (P < 0.01). Thus, these factors were adjusted in the linear regression model, and the resulting trait residual was further considered for ADG association test.
PennCNV v1.0.4 software was utilized to identify CNV across autosomes . PennCNV incorporates both the Log R Ratio (LogRR) value and the frequency of allele B (BAF) for CNV detection. The CNV calling was carried out following the previous study by Yang et al. . The final CNV events were produced by keeping high quality samples according to the following criteria: call rate > 0.95, standard deviation (SD) of LRR < 0.35, and GC waviness factors as 0.005.
CNV association analysis
To identify CNV regions associated with ADG, CNV calls and quality measures were translated to probe level using ParseCNV . ParseCNV proposes an integrative CNV association method that convert CNV calls into probe-based statistics for individual CNVs. As CNV boundaries vary across individuals, the beginning and end points of CNVs may be unclear, we are not able to classify different CNVs as identical or different, thus CNV association test were performed at the probe level.
We tested the frequency of SNP probes affected by various CNV types separately, i.e. deletions, duplications and genomic regions affected by both types of CNV. The association between CNV carrier frequencies and ADG across population were evaluated using linear mixed model implemented in EMMAX software . Relatedness among individuals was utilized as random effects based on SNPs genotype. For CNV association, a suggestive genome wide threshold was considered in present study as suggested by . The probe-based statistical significance (−log10 P-value) of neighboring probes were calculated using EMMAX method. Then the neighboring SNPs with comparable significance were collapsed into CNVRs which constitute genomic span of consecutive probes (at less two probes). The local lowest P-value for identified probes was used to represent the significant level of association of CNVR. Accordingly, a multiple correction was carried out for each probe using qvalue package , and q value < 0.05 was used to determine level of significance.
Region-based CNV association analyses
We next utilized the density of probes within CNV regions to assess the possible enrichment of region-based CNVs. The cumulative burden of CNVs can be effectively estimated on a region level using the approach implemented in CNVtools . It combines the information across CNV probes to obtain a one-dimensional signal using principal component and Bayesian information criterion for each sample. A copy number genotype was assigned to each locus for each individual to test for genetic association with a quantitative trait based on a standard regression approach. The exact boundaries of the candidate regions were based on the BosTau6 (UMD 3.1) reference assembly.
Pathway analysis and CNV genes annotation
We searched the genes affected by the identified CNVs using UCSC genome browser (UMD 3.1). Any refSeq genes that was either fully included or broken by CNV that were considered as CNV affected. To evaluate the effects of disrupted genes from any particular functionally defined molecular pathway, we investigated the CNV-disrupting genes using the DAVID gene functional classification system . Deletion and duplication were considered separately. To avoid false positives, we further considered that enriched pathway which have at least two genes and the P value < 0.05 after the Bonferroni correction for multiple testing.
CNVs overlapped with QTLs associated with ADG traits
QTLs information were downloaded from cattle QTLdb . We merged all QTL regions into a set of unique non-redundant regions. The coordinates of QTLs based on Btau_4.0 were converted to UMD3.1. The liftOver conversion between assemblies was conducted at a relaxed threshold (Minimum ratio of bases that must remapped was set to 75%).
Next generation sequencing analysis
Genomic DNA from four Chinese Simmental bulls was extracted from blood samples using a TIANamp Blood DNA Kit (Tiangen Biotech Company limited, Beijing, China), and DNA with an A260/280 ratio between 1.8 and 2.0 were subjected to further library construction. Two paired-end libraries were constructed for each individual, the read length was 2 × 150 bp, and whole genome sequencing was performed using Illumina Hiseq2500 instruments (Illumina Inc., San Diego, CA, USA). All processes were performed according to the standard manufacturer’s protocols. Each sample was sequenced to an approximate coverage of 20X. We removed low-quality reads following filters: (1) reads with an adaptor, (2) reads containing more than 10% unknown bases, (3) reads containing more than 50% low-quality bases. After filtering, we used the bwa-0.7.8 with parameters (mem -t 4 -k 32 -M) to perform sequence alignment based on the UMD3.1 genome assembly .
Quantitative PCR validation
Quantitative PCR (qPCR) was utilized to validate seven associated CNVs detected by PennCNV. For each CNV, primers were designed using Primer3 web tool (http://bioinfo.ut.ee/primer3-0.4.0/primer3/). To ensure the amplification efficiencies, standard curve of each pair of primer was generated using template from serial diluted genomic DNA sample of a common cattle. The Basic Transcription Factor 3 (BTF3) gene was selected as the control assuming two copies of DNA segment. With a total volume of 20 μL reagents in a 96-well plate, qPCR was conducted using SYBR green chemistry in triplicate reactions on ABI STEPONE plus, thermo Real-Time PCR System. The condition for thermal cycle was as follows: 2 min at 95 °C followed by 40 cycles at 95 °C for 10 s, 60 °C for 40 s. We calculated the relative copy number for each selected region using the 2-ΔΔCT method. First, the average CT value of three replications of each sample and normalized against the control gene, then ΔCT value was estimated between the CNV carrier sample and a reference sample with normal status.
The authors would like to thank the staffs at the cattle experimental unit in Beijing and Ulgai for caring of animals and collection biological samples.
This study was supported by the National Natural Science Foundation of China (31702084) and Agricultural Science and Technology Innovation Program of China (ASTIP-IAS-TS-9, ASTIP-IAS-03 and ASTIP-IAS-TS-16) for the design of the study and sample collection. Also, this study was supported by the Elite Youth Program in Chinese Academy of Agricultural Sciences for the data analysis and interpretation of the study.
Availability of data and materials
Datasets are available from the Dryad Digital Repository (doi: https://doi.org/10.5061/dryad.4qc06).
Consent to participate
Conceived and designed the experiments: LYX, GEL and JYL. Performed the experiments: LW, LPZ and XG. Analyzed the data: LY, HJG and BZ. Contributed reagents/materials/analysis tools: LY, YC and JYL. Wrote the paper: LYX, GEL and JYL. All authors have read and approved the manuscript.
Ethics approval and consent to participate
No ethics statement was required for the collection of genetic material. The data from animals included in this study were derived from previous analyses that obtained specific permissions .
Consent for publication
The authors declare that they have no competing interests except that George Liu is a member of the editorial board (Associate Editor) of this journal.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 10.Hou Y, Liu GE, Bickhart DM, Matukumalli LK, Li C, Song J, Gasbarre LC, Van Tassell CP, Sonstegard TS. Genomic regions showing copy number variations associate with resistance or susceptibility to gastrointestinal nematodes in Angus cattle. Funct Integr Genomics. 2012;12(1):81–92.CrossRefGoogle Scholar
- 15.Kadri NK, Sahana G, Charlier C, Iso-Touru T, Guldbrandtsen B, Karim L, Nielsen US, Panitz F, Aamand GP, Schulman N, et al. A 660-Kb deletion with antagonistic effects on fertility and milk production segregates at high frequency in Nordic red cattle: additional evidence for the common occurrence of balancing selection in livestock. PLoS Genet. 2014;10(1):e1004049.CrossRefGoogle Scholar
- 17.Hoshiba H, Setoguchi K, Watanabe T, Kinoshita A, Mizoshita K, Sugimoto Y, Takasuga A. Comparison of the effects explained by variations in the bovine PLAG1 and NCAPG genes on daily body weight gain, linear skeletal measurements and carcass traits in Japanese black steers from a progeny testing program. Anim Sci J. 2013;84(7):529–34.CrossRefGoogle Scholar
- 21.Lindholm-Perry AK, Kuehn LA, Oliver WT, Kern RJ, Cushman RA, Miles JR, McNeel AK, Freetly HC. DNA polymorphisms and transcript abundance of PRKAG2 and phosphorylated AMP-activated protein kinase in the rumen are associated with gain and feed intake in beef steers. Anim Genet. 2014;45(4):461–72.CrossRefGoogle Scholar
- 22.Lindholm-Perry AK, Sexten AK, Kuehn LA, Smith TP, King DA, Shackelford SD, Wheeler TL, Ferrell CL, Jenkins TG, Snelling WM, et al. Association, effects and validation of polymorphisms within the NCAPG - LCORL locus located on BTA6 with feed intake, gain, meat and carcass traits in beef cattle. BMC Genet. 2011;12:103.CrossRefGoogle Scholar
- 24.Lindholm-Perry AK, Kuehn LA, Snelling WM, Smith TP, Ferrell CL, Jenkins TG, King DA, Shackelford SD, Wheeler TL, Freetly HC. Genetic markers on BTA14 predictive for residual feed intake in beef steers and their effects on carcass and meat quality traits. Anim Genet. 2012;43(5):599–603.CrossRefGoogle Scholar
- 47.Magee DA, Sikora KM, Berkowicz EW, Berry DP, Howard DJ, Mullen MP, Evans RD, Spillane C, MacHugh DE. DNA sequence polymorphisms in a panel of eight candidate bovine imprinted genes and their association with performance traits in Irish Holstein-Friesian cattle. BMC Genet. 2010;11:93.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.