Human Genetics

, Volume 131, Issue 1, pp 131–143

Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity

  • Pramod Gautam
  • Pankaj Jha
  • Dhirendra Kumar
  • Shivani Tyagi
  • Binuja Varma
  • Debasis Dash
  • Arijit Mukhopadhyay
  • Indian Genome Variation Consortium
  • Mitali Mukerji
Original Investigation

DOI: 10.1007/s00439-011-1050-5

Cite this article as:
Gautam, P., Jha, P., Kumar, D. et al. Hum Genet (2012) 131: 131. doi:10.1007/s00439-011-1050-5

Abstract

Copy number variations (CNVs) have provided a dynamic aspect to the apparently static human genome. We have analyzed CNVs larger than 100 kb in 477 healthy individuals from 26 diverse Indian populations of different linguistic, ethnic and geographic backgrounds. These CNVRs were identified using the Affymetrix 50K Xba 240 Array. We observed 1,425 and 1,337 CNVRs in the deletion and amplification sets, respectively, after pooling data from all the populations. More than 50% of the genes encompassed entirely in CNVs had both deletions and amplifications. There was wide variability across populations not only with respect to CNV extent (ranging from 0.04–1.14% of genome under deletion and 0.11–0.86% under amplification) but also in terms of functional enrichments of processes like keratinization, serine proteases and their inhibitors, cadherins, homeobox, olfactory receptors etc. These did not correlate with linguistic, ethnic, geographic backgrounds and size of populations. Certain processes were near exclusive to deletion (serine proteases, keratinization, olfactory receptors, GPCRs) or duplication (homeobox, serine protease inhibitors, embryonic limb morphogenesis) datasets. Populations having same enriched processes were observed to contain genes from different genomic loci. Comparison of polymorphic CNVRs (5% or more) with those cataloged in Database of Genomic Variants revealed that 78% (2473) of the genes in CNVRs in Indian populations are novel. Validation of CNVs using Sequenom MassARRAY revealed extensive heterogeneity in CNV boundaries. Exploration of CNV profiles in such diverse populations would provide a widely valuable resource for understanding diversity in phenotypes and disease.

Supplementary material

439_2011_1050_MOESM1_ESM.tif (1.3 mb)
Size distribution of CNVRs in Database of Genome Variants (DGV).The size distribution of segments in DGV shows that 23% of the segments are of size equal to or more than 100 kb implying the importance and prevalence of large CNVs in genome Supplementary material 1 (TIFF 1.27 Mb)
439_2011_1050_MOESM2_ESM.tif (2.8 mb)
Geographic locations of populations sampled. The populations sampled in this study cover the length and breadth of India and are from various ethnic, linguistic backgrounds Supplementary material 2 (TIFF 2817 kb)
439_2011_1050_MOESM3_ESM.tif (1.2 mb)
Inter-probe distance of Affymetrix 50K Xba array. This plot shows that around 35% of probe-pairs are 5kb apart from each other, and 25% of probe pairs are just 1kb apart giving confidence over calling CN altered regions Supplementary material 3 (TIFF 1203 kb)
439_2011_1050_MOESM4_ESM.docx (1.9 mb)
Chromosomal CNV landscape in all the populations. The 26 different population show different extent of CNVs. The red line depicts deletion and blue line amplification Supplementary material 4 (DOCX 1.91 mb)
439_2011_1050_MOESM5_ESM.tif (3.5 mb)
Multiple correspondence discriminant analysis (MCDA): multiple correspondence discriminant analysis (MCDA) on all 26 Indian populations using 632 polymorphic CNVRs (present in more than 10% of cohort) to detect population stratification Supplementary material 5 (TIFF 3590 kb)
439_2011_1050_MOESM6_ESM.docx (2.3 mb)
Heterogeneity in CNV boundary: Representation of deletion and amplification regions encompassing genes in different samples as revealed by array data. The target for Sequenom probe is indicated by black arrow. There is an enormous heterogeneity in CNV boundaries and some of the CNV regions are not queried by the Sequenom MassARRAY probe Supplementary material 6 (DOCX 2.27 mb)
439_2011_1050_MOESM7_ESM.doc (407 kb)
Heterogeneity in CNV boundaries in DGV: Heterogeneity in CNV boundaries as present in public database DGV for some of the genes (ABCC1, ODAM, PRKG1 and SDK1) from our validation genes set. This observation also pointed out in our data indicates the difficulties posed in the validation of such loci Supplementary material 7 (DOC 407 kb)
439_2011_1050_MOESM8_ESM.xls (38 kb)
Supplementary material 8 (XLS 39 kb)
439_2011_1050_MOESM9_ESM.xlsx (8.4 mb)
Supplementary material 9 (XLSX 8.42 mb)
439_2011_1050_MOESM10_ESM.xls (341 kb)
Supplementary material 10 (XLS 341 kb)
439_2011_1050_MOESM11_ESM.xlsx (42 kb)
Supplementary material 11 (XLSX 41.7 kb)
439_2011_1050_MOESM12_ESM.xls (49 kb)
Supplementary material 12 (XLS 49 kb)
439_2011_1050_MOESM13_ESM.xls (31 kb)
Supplementary material 13 (XLS 31 kb)
439_2011_1050_MOESM14_ESM.xls (38 kb)
Supplementary material 14 (XLS 38 kb)

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Pramod Gautam
    • 1
  • Pankaj Jha
    • 1
  • Dhirendra Kumar
    • 2
  • Shivani Tyagi
    • 3
  • Binuja Varma
    • 3
  • Debasis Dash
    • 2
  • Arijit Mukhopadhyay
    • 1
  • Indian Genome Variation Consortium
    • 1
  • Mitali Mukerji
    • 1
  1. 1.Genomics and Molecular MedicineInstitute of Genomics and Integrative Biology (CSIR)DelhiIndia
  2. 2.G.N. Ramachandran Knowledge Centre for Genome InformaticsInstitute of Genomics and Integrative Biology (CSIR)DelhiIndia
  3. 3.The Centre of Genomic Application, (IGIB-IMM Collaboration) 254New DelhiIndia

Personalised recommendations