Human Genetics

, Volume 131, Issue 1, pp 131–143 | Cite as

Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity

  • Pramod Gautam
  • Pankaj Jha
  • Dhirendra Kumar
  • Shivani Tyagi
  • Binuja Varma
  • Debasis Dash
  • Arijit Mukhopadhyay
  • Indian Genome Variation Consortium
  • Mitali MukerjiEmail author
Original Investigation


Copy number variations (CNVs) have provided a dynamic aspect to the apparently static human genome. We have analyzed CNVs larger than 100 kb in 477 healthy individuals from 26 diverse Indian populations of different linguistic, ethnic and geographic backgrounds. These CNVRs were identified using the Affymetrix 50K Xba 240 Array. We observed 1,425 and 1,337 CNVRs in the deletion and amplification sets, respectively, after pooling data from all the populations. More than 50% of the genes encompassed entirely in CNVs had both deletions and amplifications. There was wide variability across populations not only with respect to CNV extent (ranging from 0.04–1.14% of genome under deletion and 0.11–0.86% under amplification) but also in terms of functional enrichments of processes like keratinization, serine proteases and their inhibitors, cadherins, homeobox, olfactory receptors etc. These did not correlate with linguistic, ethnic, geographic backgrounds and size of populations. Certain processes were near exclusive to deletion (serine proteases, keratinization, olfactory receptors, GPCRs) or duplication (homeobox, serine protease inhibitors, embryonic limb morphogenesis) datasets. Populations having same enriched processes were observed to contain genes from different genomic loci. Comparison of polymorphic CNVRs (5% or more) with those cataloged in Database of Genomic Variants revealed that 78% (2473) of the genes in CNVRs in Indian populations are novel. Validation of CNVs using Sequenom MassARRAY revealed extensive heterogeneity in CNV boundaries. Exploration of CNV profiles in such diverse populations would provide a widely valuable resource for understanding diversity in phenotypes and disease.


Olfactory Receptor Serine Protease Inhibitor Shrimp Alkaline Phosphatase iPLEX Sequenom MassARRAY 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Amit Chaurasia for computational and Ankita and Rishi Das Roy for IGV browser support; Financial support to MM CSIR(CMM0016, SIP0006) and Council for Scientific and Industrial Research SRF to PG and PJ is acknowledged. We also acknowledge The Centre for Genomic Applications for Microarray and Sequenom facility and Spinco Biotech Pvt. Ltd. for support with the SVS7 software. The data is available at

Supplementary material

439_2011_1050_MOESM1_ESM.tif (1.3 mb)
Size distribution of CNVRs in Database of Genome Variants (DGV).The size distribution of segments in DGV shows that 23% of the segments are of size equal to or more than 100 kb implying the importance and prevalence of large CNVs in genome Supplementary material 1 (TIFF 1.27 Mb)
439_2011_1050_MOESM2_ESM.tif (2.8 mb)
Geographic locations of populations sampled. The populations sampled in this study cover the length and breadth of India and are from various ethnic, linguistic backgrounds Supplementary material 2 (TIFF 2817 kb)
439_2011_1050_MOESM3_ESM.tif (1.2 mb)
Inter-probe distance of Affymetrix 50K Xba array. This plot shows that around 35% of probe-pairs are 5kb apart from each other, and 25% of probe pairs are just 1kb apart giving confidence over calling CN altered regions Supplementary material 3 (TIFF 1203 kb)
439_2011_1050_MOESM4_ESM.docx (1.9 mb)
Chromosomal CNV landscape in all the populations. The 26 different population show different extent of CNVs. The red line depicts deletion and blue line amplification Supplementary material 4 (DOCX 1.91 mb)
439_2011_1050_MOESM5_ESM.tif (3.5 mb)
Multiple correspondence discriminant analysis (MCDA): multiple correspondence discriminant analysis (MCDA) on all 26 Indian populations using 632 polymorphic CNVRs (present in more than 10% of cohort) to detect population stratification Supplementary material 5 (TIFF 3590 kb)
439_2011_1050_MOESM6_ESM.docx (2.3 mb)
Heterogeneity in CNV boundary: Representation of deletion and amplification regions encompassing genes in different samples as revealed by array data. The target for Sequenom probe is indicated by black arrow. There is an enormous heterogeneity in CNV boundaries and some of the CNV regions are not queried by the Sequenom MassARRAY probe Supplementary material 6 (DOCX 2.27 mb)
439_2011_1050_MOESM7_ESM.doc (407 kb)
Heterogeneity in CNV boundaries in DGV: Heterogeneity in CNV boundaries as present in public database DGV for some of the genes (ABCC1, ODAM, PRKG1 and SDK1) from our validation genes set. This observation also pointed out in our data indicates the difficulties posed in the validation of such loci Supplementary material 7 (DOC 407 kb)
439_2011_1050_MOESM8_ESM.xls (38 kb)
Supplementary material 8 (XLS 39 kb)
439_2011_1050_MOESM9_ESM.xlsx (8.4 mb)
Supplementary material 9 (XLSX 8.42 mb)
439_2011_1050_MOESM10_ESM.xls (341 kb)
Supplementary material 10 (XLS 341 kb)
439_2011_1050_MOESM11_ESM.xlsx (42 kb)
Supplementary material 11 (XLSX 41.7 kb)
439_2011_1050_MOESM12_ESM.xls (49 kb)
Supplementary material 12 (XLS 49 kb)
439_2011_1050_MOESM13_ESM.xls (31 kb)
Supplementary material 13 (XLS 31 kb)
439_2011_1050_MOESM14_ESM.xls (38 kb)
Supplementary material 14 (XLS 38 kb)


  1. Abdulla MA, Ahmed I, Assawamakin A et al (2009) Mapping human genetic diversity in Asia. Science 326:1541–1545PubMedCrossRefGoogle Scholar
  2. Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, Dey B, Roy M, Roy B, Bhattachatyya NP et al (2003) Ethinic India:a genomic view, with special reference to peopling and structure. Genome Res 13:2277–2290PubMedCrossRefGoogle Scholar
  3. Caceres A et al (2010) Multiple correspondence discriminant analysis: an application to detect stratification in copy number variation. Stat. Med. 29:3284–3293PubMedCrossRefGoogle Scholar
  4. Cann RL (2001) Genetic clues to dispersal in human populations: retracing the past from the present. Science 291:1742–1748PubMedCrossRefGoogle Scholar
  5. Chao J, Shen B, Gao L, Xia CF, Bledsoe G, Chao L (2010) Tissue kallikrein in cardiovascular, cerebrovascular and renal diseases and skin wound healing. Biol Chem 391:345–355PubMedCrossRefGoogle Scholar
  6. Clevert, Djork-Arné, Mitterecker A, Mayr, et al. (2010) cn.FARMS: a probabilistic model to detect DNA copy numbers. Nucleic Acids Research 2011:1–13Google Scholar
  7. Conrad DF, Pinto D, Redon R, Feuk L et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712PubMedCrossRefGoogle Scholar
  8. Craddock N, Hurles ME, Cardin N et al (2010) Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464:713–720PubMedCrossRefGoogle Scholar
  9. Ding C, Cantor CR (2003) A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc Natl Acad Sci U S A 100:3059–3064PubMedCrossRefGoogle Scholar
  10. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11:446–450PubMedCrossRefGoogle Scholar
  11. Estivill X, Armengol L (2007) Copy number variants and common disorders: filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet 3:1787–1799PubMedCrossRefGoogle Scholar
  12. Fanciulli M, Norsworthy PJ, Petretto E et al (2007) FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat Genet 39:721–723PubMedCrossRefGoogle Scholar
  13. Frazer KA, Murray SS, Schork NJ, Topol EJ (2009) Human genetic variation and its contribution to complex traits. Nat Rev Genet 10:241–251PubMedCrossRefGoogle Scholar
  14. Gonzalez E, Kulkarni H, Bolivar H et al (2005) The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307:1434–1440PubMedCrossRefGoogle Scholar
  15. Hasin Y, Olender T, Khen M, Gonzaga-Jauregui C, Kim PM, Urban AE, Snyder M, Gerstein MB, Lancet D, Korbel JO (2008) High-resolution copy-number variation map reflects human olfactory receptor diversity and evolution. PLoS Genet 4:e1000249PubMedCrossRefGoogle Scholar
  16. Hasin-Brumshtein Y, Lancet D, Olender T (2009) Human olfaction: from genomic variation to phenotypic diversity. Trends Genet 25:178–184PubMedCrossRefGoogle Scholar
  17. Heutinck KM, ten Berge IJ, Hack CE, Hamann J, Ro wshani AT (2010) Serine proteases of the human immune system in health and disease. Mol Immunol 47(11–12):1943–1955PubMedCrossRefGoogle Scholar
  18. Huang dW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57CrossRefGoogle Scholar
  19. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951PubMedCrossRefGoogle Scholar
  20. Indian Consortium Genome Variation (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20CrossRefGoogle Scholar
  21. Indian Genome Variation Consortium (2005) The Indian Genome Variation database (IGVdb): a project overview. Hum Genet 118:1–11CrossRefGoogle Scholar
  22. Itsara A, Cooper GM, Baker C et al (2009) Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet 84:148–161PubMedCrossRefGoogle Scholar
  23. Jakobsson M, Scholz SW, Scheet P et al (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998–1003PubMedCrossRefGoogle Scholar
  24. Kim PM, Korbel JO, Gerstein MB (2007) Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci USA 104:20274–20279PubMedCrossRefGoogle Scholar
  25. Kirov G, Grozeva D, Norton N, Ivanov D, Mantripragada KK, Holmans P, Craddock N, Owen MJ, O’Donovan MC (2009) Support for the involvement of large copy number variants in the pathogenesis of schizophrenia. Hum Mol Genet 18:1497–1503PubMedCrossRefGoogle Scholar
  26. Kusenda M, Sebat J (2008) The role of rare structural variants in the genetics of autism spectrum disorders. Cytogenet Genome Res 123:36–43PubMedCrossRefGoogle Scholar
  27. Lee JA, Carvalho CM, Lupski JR (2007) A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131:1235–1247PubMedCrossRefGoogle Scholar
  28. Lopez CC, Brems H, Lazaro C, Estivill X, Clementi M, Mason S, Rutkowski JL, Marynen P, Legius E (1999) Molecular studies in 20 submicroscopic neurofibromatosis type 1 gene deletions. Hum Mutat 14:387–393CrossRefGoogle Scholar
  29. Majumder PP (1998) people of India: biological diversity and affinities. Evol Anthrop 6:100–110CrossRefGoogle Scholar
  30. Majumder PP (2001) Ethnic populations of India as seen from an evolutionary perspective. J Biosci 26:533–545PubMedCrossRefGoogle Scholar
  31. Malhotra KC (1978) Morphological composition of the people of India. J Hum Evol 7:45–63CrossRefGoogle Scholar
  32. McCarroll SA, Altshuler DM (2007) Copy-number variation and association studies of human disease. Nat Genet 39:S37–S42PubMedCrossRefGoogle Scholar
  33. McCarroll SA, Kuruvilla FG, Korn JM et al (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40:1166–1174PubMedCrossRefGoogle Scholar
  34. McKinney C, Fanciulli M, Merriman ME et al. (2010) Association of variation in Fc {gamma} receptor 3B gene copy number with rheumatoid arthritis in Caucasian samples. Ann Rheum DisGoogle Scholar
  35. Perry GH, Dominy NJ, Claw KG et al (2007) Diet and the evolution of human amylase gene copy number variation. Nat Genet 39:1256–1260PubMedCrossRefGoogle Scholar
  36. Redon R, Ishikawa S, Fitch KR, Feuk L et al (2006) Global variation in copy number in the human genome. Nature 444:444–454PubMedCrossRefGoogle Scholar
  37. Reich D, Thangaraj K, Patterson N, Price AL, Singh L (2009) Reconstructing Indian population history. Nature 461:489–494PubMedCrossRefGoogle Scholar
  38. Sebat J, Lakshmi B, Troge J, Alexander J et al (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528PubMedCrossRefGoogle Scholar
  39. Sebat J, Lakshmi B, Malhotra D et al (2007) Strong association of de novo copy number mutations with autism. Science 316:445–449PubMedCrossRefGoogle Scholar
  40. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A et al (2006) Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 78:202–221PubMedCrossRefGoogle Scholar
  41. Singh KS (2002) People of India: introduction national series. Anthropological Survey of India. Oxford University Press, DelhiGoogle Scholar
  42. The HUGO Pan-Asian SNP Consortium (2009) Mapping human genetic diversity in Asia. Science, pp 1541–1545Google Scholar
  43. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE (2005) Fine-scale structural variation of the human genome. Nat Genet 37:727–732PubMedCrossRefGoogle Scholar
  44. Walsh T, McClellan JM, McCarthy SE et al (2008) Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320:539–543PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Pramod Gautam
    • 1
  • Pankaj Jha
    • 1
  • Dhirendra Kumar
    • 2
  • Shivani Tyagi
    • 3
  • Binuja Varma
    • 3
  • Debasis Dash
    • 2
  • Arijit Mukhopadhyay
    • 1
  • Indian Genome Variation Consortium
    • 1
  • Mitali Mukerji
    • 1
    Email author
  1. 1.Genomics and Molecular MedicineInstitute of Genomics and Integrative Biology (CSIR)DelhiIndia
  2. 2.G.N. Ramachandran Knowledge Centre for Genome InformaticsInstitute of Genomics and Integrative Biology (CSIR)DelhiIndia
  3. 3.The Centre of Genomic Application, (IGIB-IMM Collaboration) 254New DelhiIndia

Personalised recommendations