Computational detection and experimental validation of segmental duplications and associated copy number variations in water buffalo ( Bubalus bubalis )


Duplicated sequences are an important source of gene evolution and structural variation within mammalian genomes. Using a read depth approach based on next-generation sequencing, we performed a genome-wide analysis of segmental duplications (SDs) and associated copy number variations (CNVs) in the water buffalo (Bubalus bubalis). By aligning short reads of Olimpia (the reference water buffalo) to the UMD3.1 cattle genome, we identified 1,038 segmental duplications comprising 44.6 Mb (equivalent to ~1.73% of the cattle genome) of the autosomal and X chromosomal sequence in the buffalo genome. We experimentally validated 70.3% (71/101) of these duplications using fluorescent in situ hybridization. We also detected a total of 1,344 CNV regions across 14 additional water buffaloes, amounting to 59.8 Mb of variable sequence or the equivalent of 2.2% of the cattle genome. The CNV regions overlap 1,245 genes that are significantly enriched for specific biological functions including immune response, oxygen transport, sensory system and signal transduction. Additionally, we performed array Comparative Genomic Hybridization (aCGH) experiments using the 14 water buffaloes as test samples and Olimpia as the reference. Using a linear regression model, a high Pearson correlation (r = 0.781) was observed between the log2 ratios between copy number estimates and the log2 ratios of aCGH probes. We further designed Quantitative PCR assays to confirm CNV regions within or near annotated genes and found 74.2% agreement with our CNV predictions. These results confirm sub-chromosome-scale structural rearrangements present in the cattle and water buffalo. The information on genome variation that will be of value for evolutionary and phenotypic studies, and may be useful for selective breeding of both species.

Data Availability

The aCGH raw data from the 14 water buffaloes have been submitted to the NCBI under GEO accession ID GSE118117. All 101 FISH results are posted on



We thank Reuben Anderson and Alexandre Dimtchev for technical assistance. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. The USDA is an equal opportunity provider and employer.


GEL was partially supported by appropriated project 1265-3200-083-00D from the USDA Agricultural Research Service (Beltsville Agricultural Research Center), AFRI grant number 2013-67015-20951 from the USDA National Institute of Food and Agriculture (NIFA) Animal Genome and Reproduction Programs, and BARD grant number US-4997-17 from the US-Israel Binational Agricultural Research and Development (BARD) Fund. WL and DMB were supported by appropriated project 5090-31000-024-00-D from the USDA Agriculture Research Service (Dairy Forage Research Center). WYL and JLW are funded by the JS Davies Bequest to the University of Adelaide.

DMB and GEL conceived and designed the experiments. JLW, DI, LI, SGS, TSS, CPVT, CRC, and MV collected samples and/or generated HTS and FISH data. DMB, SL, XK, ML, and BDR performed computational and statistical analyses for HTS, aCGH and qPCR. SL, DMB and GEL wrote the paper. All authors read and approved the final manuscript.

