Computational detection and experimental validation of segmental duplications and associated copy number variations in water buffalo ( Bubalus bubalis )


Duplicated sequences are an important source of gene evolution and structural variation within mammalian genomes. Using a read depth approach based on next-generation sequencing, we performed a genome-wide analysis of segmental duplications (SDs) and associated copy number variations (CNVs) in the water buffalo (Bubalus bubalis). By aligning short reads of Olimpia (the reference water buffalo) to the UMD3.1 cattle genome, we identified 1,038 segmental duplications comprising 44.6 Mb (equivalent to ~1.73% of the cattle genome) of the autosomal and X chromosomal sequence in the buffalo genome. We experimentally validated 70.3% (71/101) of these duplications using fluorescent in situ hybridization. We also detected a total of 1,344 CNV regions across 14 additional water buffaloes, amounting to 59.8 Mb of variable sequence or the equivalent of 2.2% of the cattle genome. The CNV regions overlap 1,245 genes that are significantly enriched for specific biological functions including immune response, oxygen transport, sensory system and signal transduction. Additionally, we performed array Comparative Genomic Hybridization (aCGH) experiments using the 14 water buffaloes as test samples and Olimpia as the reference. Using a linear regression model, a high Pearson correlation (r = 0.781) was observed between the log2 ratios between copy number estimates and the log2 ratios of aCGH probes. We further designed Quantitative PCR assays to confirm CNV regions within or near annotated genes and found 74.2% agreement with our CNV predictions. These results confirm sub-chromosome-scale structural rearrangements present in the cattle and water buffalo. The information on genome variation that will be of value for evolutionary and phenotypic studies, and may be useful for selective breeding of both species.

This is a preview of subscription content, log in to check access.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Data Availability

The aCGH raw data from the 14 water buffaloes have been submitted to the NCBI under GEO accession ID GSE118117. All 101 FISH results are posted on



array Comparative Genomic Hybridization


Bovine Leucocyte Antigens


basic transcription factor 3


copy number


CNV regions


copy number variations


cycle thresholds




Fc fragment of IgG receptor IIIa


false discovery rate


frizzled class receptor 3


fluorescence in situ hybridization


high throughput sequencing


killer cell lectin like receptor K1


Locally Weighted Scatter-plot Smoother


mitotic arrest deficient 2 like 1


major histocompatibility complex


olfactory receptor


pregnancy-associated glycoprotein


peptidase inhibitor 3


read depth


Read pair


sequence assembly


segmental duplications


single nucleosome polymorphisms


split read


standard deviations


T cell receptor alpha variable


UL16 binding protein 3


whole genome shotgun sequence detection


  1. Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, Sahinalp SC, Gibbs RA, Eichler EE (2009) Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 41(10):1061–1067

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Belaaouaj A, McCarthy R, Baumann M, Gao Z, Ley TJ, Abraham SN, Shapiro SD (1998) Mice lacking neutrophil elastase reveal impaired host defense against gram negative bacterial sepsis. Nat Med 4(5):615–618

    Article  CAS  PubMed  Google Scholar 

  3. Bickhart DM, Liu GE (2014) The challenges and importance of structural variation detection in livestock. Front Genet 5:37

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, Garcia JF, van Tassell CP, Sonstegard TS, Eichler EE, Liu GE (2012) Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res 22(4):778–790

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Bickhart DM, Xu L, Hutchison JL, Cole JB, Null DJ, Schroeder SG, Song J, Garcia JF, Sonstegard TS, Van Tassell CP et al (2016) Diversity and population-genetic properties of copy number variations and multicopy genes in cattle. DNA research : an international journal for rapid publication of reports on genes and genomes 23(3):253–262

    Article  CAS  Google Scholar 

  6. Brown KH, Dobrinski KP, Lee AS, Gokcumen O, Mills RE, Shi X, Chong WW, Chen JY, Yoo P, David S et al (2012) Extensive genetic diversity and substructuring among zebrafish strains revealed through copy number variant analysis. Proc Natl Acad Sci U S A 109(2):529–534

    Article  PubMed  Google Scholar 

  7. Cheng Z, Ventura M, She X, Khaitovich P, Graves T, Osoegawa K, Church D, DeJong P, Wilson RK, Paabo S et al (2005) A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature 437(7055):88–93

    Article  CAS  PubMed  Google Scholar 

  8. Colli L, Milanesi M, Vajana E, Iamartino D, Bomba L, Puglisi F, Del Corvo M, Nicolazzi EL, Ahmed SSE, Herrera JRV et al (2018) New Insights on Water Buffalo Genomic Diversity and Post-Domestication Migration Routes From Medium Density SNP Chip Data. Front Genet 9:53

    Article  PubMed  PubMed Central  Google Scholar 

  9. Connelley TK, Degnan K, Longhi CW, Morrison WI (2014) Genomic analysis offers insights into the evolution of the bovine TRA/TRD locus. BMC Genomics 15:994

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Doan R, Cohen N, Harrington J, Veazey K, Juras R, Cothran G, McCue ME, Skow L, Dindot SV (2012) Identification of copy number variants in horses. Genome Res 22(5):899–907

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Fontanesi L, Martelli PL, Beretti F, Riggio V, Dall'Olio S, Colombo M, Casadio R, Russo V, Portolano B (2010) An initial comparative map of copy number variations in the goat (Capra hircus) genome. BMC Genomics 11:639

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Fontanesi L, Beretti F, Martelli PL, Colombo M, Dall'olio S, Occidente M, Portolano B, Casadio R, Matassino D, Russo V (2011) A first comparative map of copy number variations in the sheep genome. Genomics 97(3):158–165

    Article  CAS  PubMed  Google Scholar 

  13. Fujishima S, Morisaki H, Ishizaka A, Kotake Y, Miyaki M, Yoh K, Sekine K, Sasaki J, Tasaka S, Hasegawa N, Kawai Y, Takeda J, Aikawa N (2008) Neutrophil elastase and systemic inflammatory response syndrome in the initiation and development of acute lung injury among critically ill patients. Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie 62(5):333–338

    Article  CAS  Google Scholar 

  14. Gokcumen O, Lee C (2009) Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods 49(1):18–25

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Hach F, Hormozdiari F, Alkan C, Hormozdiari F, Birol I, Eichler EE, Sahinalp SC (2010) mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat Methods 7(8):576–577

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Handsaker RE, Van DV, Berman JR, Genovese G, Kashin S, Boettger LM, SA MC (2015) Large multiallelic copy number variations in humans. Nat Genet 47(3):296–303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Henrichsen CN, Vinckenbosch N, Zollner S, Chaignat E, Pradervand S, Schutz F, Ruedi M, Kaessmann H, Reymond A (2009) Segmental copy number variation shapes tissue transcriptomes. Nat Genet 41(4):424–429

    Article  CAS  PubMed  Google Scholar 

  18. Hou Y, Liu GE, Bickhart DM, Cardone MF, Wang K, Kim ES, Matukumalli LK, Ventura M, Song J, VanRaden PM et al (2011) Genomic characteristics of cattle copy number variations. BMC Genomics 12:127

    Article  PubMed  PubMed Central  Google Scholar 

  19. Iamartino D, Nicolazzi EL, Van Tassell CP, Reecy JM, Fritz-Waters ER, Koltes JE, Biffani S, Sonstegard TS, Schroeder SG, Ajmone-Marsan P et al (2017) Design and validation of a 90K SNP genotyping assay for the water buffalo (Bubalus bubalis). PLoS One 12(10):e0185220

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Jiang J, Wang J, Wang H, Zhang Y, Kang H, Feng X, Wang J, Yin Z, Bao W, Zhang Q, Liu JF (2014) Global copy number analyses by next generation sequencing provide insight into pig genome variation. BMC Genomics 15:593

    Article  PubMed  PubMed Central  Google Scholar 

  21. Kato T, Daigo Y, Aragaki M, Ishikawa K, Sato M, Kondo S, Kaji M (2011) Overexpression of MAD2 predicts clinical outcome in primary lung cancer patients. Lung Cancer 74(1):124–131

    Article  PubMed  Google Scholar 

  22. Klambauer G, Schwarzbauer K, Mayr A, Clevert DA, Mitterecker A, Bodenhofer U, Hochreiter S cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate. Nucleic Acids Res, 2012 40(9):e69

  23. Li W, Olivier M (2013) Current analysis platforms and methods for detecting copy number variation. Physiol Genomics 45(1):1–16

    Article  CAS  PubMed  Google Scholar 

  24. Li W, Bickhart DM, Ramunno L, Iamartino D, Williams JL, Liu GE (2018) Comparative sequence alignment reveals River Buffalo genomic structural differences compared with cattle. Genomics.

  25. Liu GE, Ventura M, Cellamare A, Chen L, Cheng Z, Zhu B, Li C, Song J, Eichler EE (2009) Analysis of recent segmental duplications in the bovine genome. BMC Genomics 10:571

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, Mitra A, Alexander LJ, Coutinho LL, Dell'aquila ME et al (2010) Analysis of copy number variations among diverse cattle breeds. Genome Res 20(5):693–703

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lucas Lledo JI, Caceres M (2013) On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing. PLoS One 8(4):e61292

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD (2017) PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res 45(D1):D183–D189

    Article  CAS  Google Scholar 

  29. Michelizzi VN, Dodson MV, Pan Z, Amaral ME, Michal JJ, McLean DJ, Womack JE, Jiang Z (2010) Water buffalo genome science comes of age. Int J Biol Sci 6(4):333–349

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HY, Leng J, Li R, Li Y, Lin CY, Luo R, Mu XJ, Nemesh J, Peckham HE, Rausch T, Scally A, Shi X, Stromberg MP, Stütz AM, Urban AE, Walker JA, Wu J, Zhang Y, Zhang ZD, Batzer MA, Ding L, Marth GT, McVean G, Sebat J, Snyder M, Wang J, Ye K, Eichler EE, Gerstein MB, Hurles ME, Lee C, McCarroll S, Korbel JO, 1000 Genomes Project (2011) Mapping copy number variation by population-scale genome sequencing. Nature 470(7332):59–65

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Nath S, Moghe M, Chowdhury A, Godbole K, Godbole G, Doiphode M, Roychoudhury S (2012) Is germline transmission of MAD2 gene deletion associated with human fetal loss? Mol Hum Reprod 18(11):554–562

    Article  CAS  PubMed  Google Scholar 

  32. Nicholas TJ, Cheng Z, Ventura M, Mealey K, Eichler EE, Akey JM (2009) The genomic architecture of segmental duplications and associated copy number variants in dogs. Genome Res 19(3):491–499

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Nikolich-Zugich J, Slifka MK, Messaoudi I (2004) The many important facets of T-cell repertoire diversity. Nat Rev Immunol 4(2):123–132

    Article  CAS  PubMed  Google Scholar 

  34. Oldeschulte DL, Halley YA, Wilson ML, Bhattarai EK, Brashear W, Hill J, Metz RP, Johnson CD, Rollins D, Peterson MJ, Bickhart DM, Decker JE, Sewell JF, Seabury CM (2017) Annotated draft genome assemblies for the Northern Bobwhite (colinus virginianus) and the scaled quail (callipepla squamata) reveal disparate estimates of modern genome diversity and historic effective population size. G3 7(9):3047–3058

    Article  CAS  PubMed  Google Scholar 

  35. Pinto D, Darvishi K, Shi XH, Rajan D, Rigler D, Fitzgerald T, Lionel AC, Thiruvahindrapuram B, MacDonald JR, Mills R et al (2011) Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol 29(6):512–U576

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC (2018) Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods 15(6):461–468

  37. Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, Hamilton G, Hindle AK, Huey B, Kimura K, Law S, Myambo K, Palmer J, Ylstra B, Yue JP, Gray JW, Jain AN, Pinkel D, Albertson DG (2001) Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet 29(3):263–264

    Article  CAS  PubMed  Google Scholar 

  38. Snyder M, Du J, Gerstein M (2010) Personal genome sequencing: current approaches and challenges. Genes Dev 24(5):423–431

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J, Genomes P et al (2010) Diversity of human copy number variation and multicopy genes. Science 330(6004):641–646

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sudmant PH, Mallick S, Nelson BJ, Hormozdiari F, Krumm N, Huddleston J, Coe BP, Baker C, Nordenfelt S, Bamshad M, Jorde LB, Posukh OL, Sahakyan H, Watkins WS, Yepiskoposyan L, Abdullah MS, Bravi CM, Capelli C, Hervig T, Wee JTS, Tyler-Smith C, van Driem G, Romero IG, Jha AR, Karachanak-Yankova S, Toncheva D, Comas D, Henn B, Kivisild T, Ruiz-Linares A, Sajantila A, Metspalu E, Parik J, Villems R, Starikovskaya EB, Ayodo G, Beall CM, di Rienzo A, Hammer MF, Khusainova R, Khusnutdinova E, Klitz W, Winkler C, Labuda D, Metspalu M, Tishkoff SA, Dryomov S, Sukernik R, Patterson N, Reich D, Eichler EE (2015) Global diversity, population stratification, and selection of human copy-number variation. Science 349(6253):aab3761

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG (2012) Primer3--new capabilities and interfaces. Nucleic Acids Res 40(15):e115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Vivier E, Tomasello E, Paul P (2002) Lymphocyte activation via NKG2D: towards a new paradigm in immune recognition? Curr Opin Immunol 14(3):306–311

    Article  CAS  PubMed  Google Scholar 

  43. Wallace RM, Pohler KG, Smith MF, Green JA (2015) Placental PAGs: gene origins, expression patterns, and use as markers of pregnancy. Reproduction 149(3):R115–R126

    Article  CAS  PubMed  Google Scholar 

  44. Wang Y, Thekdi N, Smallwood PM, Macke JP, Nathans J (2002) Frizzled-3 is required for the development of major fiber tracts in the rostral CNS. J Neurosci 22(19):8563–8573

    Article  CAS  PubMed  Google Scholar 

  45. Whitacre LK, Hoff JL, Schnabel RD, Albarella S, Ciotola F, Peretti V, Strozzi F, Ferrandi C, Ramunno L, Sonstegard TS, Williams JL, Taylor JF, Decker JE (2017) Elucidating the genetic basis of an oligogenic birth defect using whole genome sequence data in a non-model organism. Bubalus bubalis Scientific reports 7:39719

    Article  CAS  PubMed  Google Scholar 

  46. Williams JL, Iamartino D, Pruitt KD, Sonstegard T, Smith TPL, Low WY, Biagini T, Bomba L, Capomaccio S, Castiglioni B, Coletta A, Corrado F, Ferré F, Iannuzzi L, Lawley C, Macciotta N, McClure M, Mancini G, Matassino D, Mazza R, Milanesi M, Moioli B, Morandi N, Ramunno L, Peretti V, Pilla F, Ramelli P, Schroeder S, Strozzi F, Thibaud-Nissen F, Zicarelli L, Ajmone-Marsan P, Valentini A, Chillemi G, Zimin A (2017) Genome assembly and transcriptome resource for river buffalo, Bubalus bubalis (2n = 50). GigaScience 6(10):1–6

    Article  PubMed  PubMed Central  Google Scholar 

  47. Yi G, Qu L, Liu J, Yan Y, Xu G, Yang N (2014) Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing. BMC Genomics 15:962

    Article  PubMed  PubMed Central  Google Scholar 

  48. Zhang Y, Sun D, Yu Y, Zhang Y (2007) Genetic diversity and differentiation of Chinese domestic buffalo based on 30 microsatellite markers. Anim Genet 38(6):569–575

    Article  CAS  PubMed  Google Scholar 

  49. Zhang L, Jia S, Yang M, Xu Y, Li C, Sun J, Huang Y, Lan X, Lei C, Zhou Y, Zhang C, Zhao X, Chen H (2014) Detection of copy number variations and their effects in Chinese bulls. BMC Genomics 15:480

    Article  PubMed  PubMed Central  Google Scholar 

  50. Zhou Y, Utsunomiya YT, Xu L, el HA H, Bickhart DM, Sonstegard TS, Van Tassell CP, Garcia JF, Liu GE (2016) Comparative analyses across cattle genders and breeds reveal the pitfalls caused by false positive and lineage-differential copy number variations. Sci Rep 6:29219

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS et al (2009) A whole-genome assembly of the domestic cow, Bos taurus. Genome Biol 10(4):R42

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank Reuben Anderson and Alexandre Dimtchev for technical assistance. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture. The USDA is an equal opportunity provider and employer.


GEL was partially supported by appropriated project 1265-3200-083-00D from the USDA Agricultural Research Service (Beltsville Agricultural Research Center), AFRI grant number 2013-67015-20951 from the USDA National Institute of Food and Agriculture (NIFA) Animal Genome and Reproduction Programs, and BARD grant number US-4997-17 from the US-Israel Binational Agricultural Research and Development (BARD) Fund. WL and DMB were supported by appropriated project 5090-31000-024-00-D from the USDA Agriculture Research Service (Dairy Forage Research Center). WYL and JLW are funded by the JS Davies Bequest to the University of Adelaide.

Author information




DMB and GEL conceived and designed the experiments. JLW, DI, LI, SGS, TSS, CPVT, CRC, and MV collected samples and/or generated HTS and FISH data. DMB, SL, XK, ML, and BDR performed computational and statistical analyses for HTS, aCGH and qPCR. SL, DMB and GEL wrote the paper. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Derek M. Bickhart or George E. Liu.

Ethics declarations

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material


(PDF 1257 kb)


(XLSX 570 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, S., Kang, X., Catacchio, C.R. et al. Computational detection and experimental validation of segmental duplications and associated copy number variations in water buffalo ( Bubalus bubalis ) . Funct Integr Genomics 19, 409–419 (2019).

Download citation


  • Segmental duplication
  • Copy number variation
  • Bubalus bubalis
  • Fluorescent in situ hybridization
  • Array Comparative Genomic Hybridization
  • Quantitative PCR