Skip to main content
Log in

Whole genome variant analysis in three ethnically diverse Indians

  • Research Article
  • Published:
Genes & Genomics Aims and scope Submit manuscript

Abstract

India represents an amazing confluence of geographically, linguistically and socially disparate ethnic populations (Indian Genome Variation Consortium, J Genet 87:3–20, 2008). Understanding the genetic diversity of Indian population remains a daunting task. In this paper we present detailed analysis of genomic variations (high-depth coverage (~ 30×) using Illumina Hiseq 2000 platform) from three healthy Indian male individuals each belonging to three geographically delineated regions and linguistic phylum viz. high altitude region of Ladakh (Tibeto-Burman linguistic phylum), sub mountainous region of Kumaun (Indo-European linguistic phylum) and sea level region of Telangana (Dravidian linguistic phylum) for probing the extent of genetic diversity in our population. The sequencing analysis provided high quality data (~ 95% of the total reads aligned to the human reference genome for each sample) and very good alignment quality (> 80% of the filtered mapped reads had a quality score of 60). A total of 4.3, 3.7 and 4.3 million single nucleotide variations were identified in the genome of high altitude, sub mountainous and sea level respectively by comparing with human reference genome. Approximately 17.3, 18.2, 17.4% of the variants were unique in the three genomes. The study identified many novel variations in the three diverse genomes (132,970 in Ladakh, 112,317 in Kumaun and 128,881 in Telangana individual) and is an important resource for creating a baseline and a comprehensive catalogue of human genomic variation across the Indian as well as the Asian continent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA, 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073

    Article  PubMed  Google Scholar 

  • Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM (2012) An integrated map of genetic variation from 1092 human genomes. Nature 491:56–65

    Article  PubMed  Google Scholar 

  • Ahn SM et al (2009) The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res 19:1622–1629

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chambers JC et al (2014) The South Asian genome. PLoS ONE 9:e102645

    Article  PubMed  PubMed Central  Google Scholar 

  • Chandrasekar A et al (2009) Updating phylogeny of mitochondrial DNA macrohaplogroup M in India: dispersal of modern human in South Asian corridor. PLoS ONE 4:e7447

    Article  PubMed  PubMed Central  Google Scholar 

  • Cirulli ET, Goldstein DB (2010) Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11:415–425

    Article  CAS  PubMed  Google Scholar 

  • da Huang W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13

    Article  Google Scholar 

  • Derenko M et al (2010) Origin and post-glacial dispersal of mitochondrial DNA haplogroups C and D in Northern Asia. PLoS ONE 5:e15214

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dogan H, Can H, Otu HH (2014) Whole genome sequence of a Turkish individual. PLoS ONE 9:e85233

    Article  PubMed  PubMed Central  Google Scholar 

  • Fujimoto A et al (2010) Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet 42:931–936

    Article  CAS  PubMed  Google Scholar 

  • Gupta R et al (2012) Sequencing and analysis of a South Asian-Indian personal genome. BMC Genom 13:440

    Article  CAS  Google Scholar 

  • Hodgkinson A, Eyre-Walker A (2011) Variation in the mutation rate across mammalian genomes. Nat Rev Genet 12:756–766

    Article  CAS  PubMed  Google Scholar 

  • Hu CJ, Wang LY, Chodosh LA, Keith B, Simon MC (2003) Differential roles of hypoxia-inducible factor 1α (HIF-1α) and HIF-2α in hypoxic gene regulation. Mol Cell Biol 23:9361–9374

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Indian Genome Variation Consortium (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20

    Article  Google Scholar 

  • Ju YS et al (2011) Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals. Nat Genet 43:745–752

    Article  CAS  PubMed  Google Scholar 

  • Kasowski M et al (2010) Variation in transcription factor binding among humans. Science 328:232–235

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim JI et al (2009) A highly annotated whole-genome sequence of a Korean individual. Nature 460:1011–1015

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim SH, Turnbull J, Guimond S (2011) Extracellular matrix and cell signalling: the dynamic cooperation of integrin, proteoglycan and growth factor receptor. J Endocrinol 209:139–151

    Article  CAS  PubMed  Google Scholar 

  • Kloss-Brandstätter A, Pacher D, Schönherr S, Weissensteiner H, Binna R, Specht G, Kronenberg F (2011) HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum Mutat 32:25–32

    Article  PubMed  Google Scholar 

  • Kryazhimskiy S, Plotkin JB (2008) The population genetics of dN/dS. PLoS Genet 4:e1000304

    Article  PubMed  PubMed Central  Google Scholar 

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lorenzo FR et al (2014) A genetic mechanism for Tibetan high-altitude adaptation. Nat Genet 46:951–956

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24:133–141

    Article  CAS  PubMed  Google Scholar 

  • Marrero P, Abu-Amero KK, Larruga JM, Cabrera VM (2016) Carriers of human mitochondrial DNA macrohaplogroup M colonized India from southeastern Asia. BMC Evol Biol 16:246

    Article  PubMed  PubMed Central  Google Scholar 

  • Metspalu M et al (2004) Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet 5:26

    Article  PubMed  PubMed Central  Google Scholar 

  • Mills RE et al (2011) Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 21:830–839

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nei M (1972) Genetic distance between populations. Am Nat 106:283–293

    Article  Google Scholar 

  • Oota H, Saitou N, Ueda S (2002) A large-scale analysis of human mitochondrial DNA sequences with special reference to the population history of East Eurasia. Anthropol Sci 110:293–312

    Article  Google Scholar 

  • Passarino G, Semino O, Bernini LF, Santachiara-Benerecetti AS (1996) Pre-caucasoid and caucasoid genetic features of the Indian population, revealed by mtDNA polymorphisms. Am J Hum Genet 59:927–934

    CAS  PubMed  PubMed Central  Google Scholar 

  • Passarino G, Semino O, Quintana-Murci L, Excoffier L, Hammer M, Santachiara-Benerecetti AS (1998) Different genetic components in the Ethiopian population, identified by mtDNA and Y-chromosome polymorphisms. Am J Hum Genet 62:420–434

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Patowary A et al (2012) Systematic analysis and functional annotation of variations in the genome of an Indian individual. Hum Mutat 33:1133–1140

    Article  CAS  PubMed  Google Scholar 

  • Petousi N, Robbins PA (2014) Human adaptation to the hypoxia of high altitude: the Tibetan paradigm from the pregenomic to the postgenomic era. J Appl Physiol 116:875–884

    Article  CAS  PubMed  Google Scholar 

  • Pineda-Tenor D, Garcia-Alvarez M, Jimenez-Sousa MA, Vazquez-Moron S, Resino S (2015) Relationship between ITPA polymorphisms and hemolytic anemia in HCV-infected patients after ribavirin-based therapy: a meta-analysis. J Transl Med 13:320

    Article  PubMed  PubMed Central  Google Scholar 

  • Pritchard JK (2011) Whole-genome sequencing data offer insights into human demography. Nat Genet 43:923–925

    Article  CAS  PubMed  Google Scholar 

  • Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K, Santachiara-Benerecetti AS (1999) Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet 23:437–441

    Article  CAS  PubMed  Google Scholar 

  • Simonson TS et al (2010) Genetic evidence for high-altitude adaptation in Tibet. Science 29:72–75

    Article  Google Scholar 

  • Srivastava S, Bhagi S, Kumari B, Chandra K, Sarkar S, Ashraf MZ (2011) Association of polymorphisms in angiotensin and aldosterone synthase genes of the renin-angiotensin-aldosterone system with high-altitude pulmonary edema. J Renin Angiotensin Aldosterone Syst 13:155–160

    Article  PubMed  Google Scholar 

  • Tong P et al (2010) Sequencing and analysis of an Irish human genome. Genome Biol 11:R91

    Article  PubMed  PubMed Central  Google Scholar 

  • Tucker T, Marra M, Friedman JM (2009) Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet 85:142–154

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Voelkerding KV, Dames SA, Durtschi JD (2009) Next-generation sequencing: from basic research to diagnostics. Clin Chem 55:641–658

    Article  CAS  PubMed  Google Scholar 

  • Wetterbom A, Sevov M, Cavelier L, Bergstrom TF (2006) Comparative genomic analysis of human and chimpanzee indicates a key role for indels in primate evolution. J Mol Evol 63:682–690

    Article  CAS  PubMed  Google Scholar 

  • Wong LP et al (2014) Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing. PLoS Genet 10(5):e1004377

    Article  PubMed  PubMed Central  Google Scholar 

  • Xiang K et al (2013) Identification of a Tibetan-specific mutation in hypoxic gene EGLN1 and its contribution to high-altitude adaptation. Mol Biol Evol 30:1889–1898

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research was funded by Defence Research and Development Organization at Defence Institute of Physiology and Allied Sciences by Project Number ST/14-15/DIP-265/2535/D(R&D) (subproject 7) to S Sarkar. Authors are grateful to Col Shashi Shukla for interaction for participants and logistic support. Authors also acknowledge Sucha Singh and Neha Thakur for providing assistance during the course of the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soma Sarkar.

Ethics declarations

Conflict of interest

Seema Malhotra, Sayar Singh and Soma Sarkar declare that they have no conflict of interest.

Ethical standards

All procedures performed in this study were in accordance with the ethical standards of the Institutional research committee of Defence Institute of Physiology and Allied Sciences and 1964 Helsinki declaration and its amendments.

Additional information

Data Availability The whole genome sequencing data of the present study is available in the NCBI data repository at Sequence Read Archive under accession number SRP071962 (WGS1A_High Altitude Native_Ladakh_Genomic DNA), SRP071962 (WGS2A_Sub mountainous_Kumaun_Genomic DNA) and SRP071962 (WGS3A_Sealevel_Telangana_Genomic DNA) at https://www.ncbi.nlm.nih.gov/sra/.

Dr. Soma Sarkar has retired.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malhotra, S., Singh, S. & Sarkar, S. Whole genome variant analysis in three ethnically diverse Indians. Genes Genom 40, 497–510 (2018). https://doi.org/10.1007/s13258-018-0650-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13258-018-0650-z

Keywords

Navigation