Abstract
India represents an amazing confluence of geographically, linguistically and socially disparate ethnic populations (Indian Genome Variation Consortium, J Genet 87:3–20, 2008). Understanding the genetic diversity of Indian population remains a daunting task. In this paper we present detailed analysis of genomic variations (high-depth coverage (~ 30×) using Illumina Hiseq 2000 platform) from three healthy Indian male individuals each belonging to three geographically delineated regions and linguistic phylum viz. high altitude region of Ladakh (Tibeto-Burman linguistic phylum), sub mountainous region of Kumaun (Indo-European linguistic phylum) and sea level region of Telangana (Dravidian linguistic phylum) for probing the extent of genetic diversity in our population. The sequencing analysis provided high quality data (~ 95% of the total reads aligned to the human reference genome for each sample) and very good alignment quality (> 80% of the filtered mapped reads had a quality score of 60). A total of 4.3, 3.7 and 4.3 million single nucleotide variations were identified in the genome of high altitude, sub mountainous and sea level respectively by comparing with human reference genome. Approximately 17.3, 18.2, 17.4% of the variants were unique in the three genomes. The study identified many novel variations in the three diverse genomes (132,970 in Ladakh, 112,317 in Kumaun and 128,881 in Telangana individual) and is an important resource for creating a baseline and a comprehensive catalogue of human genomic variation across the Indian as well as the Asian continent.
Similar content being viewed by others
References
Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA, 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073
Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM (2012) An integrated map of genetic variation from 1092 human genomes. Nature 491:56–65
Ahn SM et al (2009) The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res 19:1622–1629
Chambers JC et al (2014) The South Asian genome. PLoS ONE 9:e102645
Chandrasekar A et al (2009) Updating phylogeny of mitochondrial DNA macrohaplogroup M in India: dispersal of modern human in South Asian corridor. PLoS ONE 4:e7447
Cirulli ET, Goldstein DB (2010) Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11:415–425
da Huang W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13
Derenko M et al (2010) Origin and post-glacial dispersal of mitochondrial DNA haplogroups C and D in Northern Asia. PLoS ONE 5:e15214
Dogan H, Can H, Otu HH (2014) Whole genome sequence of a Turkish individual. PLoS ONE 9:e85233
Fujimoto A et al (2010) Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet 42:931–936
Gupta R et al (2012) Sequencing and analysis of a South Asian-Indian personal genome. BMC Genom 13:440
Hodgkinson A, Eyre-Walker A (2011) Variation in the mutation rate across mammalian genomes. Nat Rev Genet 12:756–766
Hu CJ, Wang LY, Chodosh LA, Keith B, Simon MC (2003) Differential roles of hypoxia-inducible factor 1α (HIF-1α) and HIF-2α in hypoxic gene regulation. Mol Cell Biol 23:9361–9374
Indian Genome Variation Consortium (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20
Ju YS et al (2011) Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals. Nat Genet 43:745–752
Kasowski M et al (2010) Variation in transcription factor binding among humans. Science 328:232–235
Kim JI et al (2009) A highly annotated whole-genome sequence of a Korean individual. Nature 460:1011–1015
Kim SH, Turnbull J, Guimond S (2011) Extracellular matrix and cell signalling: the dynamic cooperation of integrin, proteoglycan and growth factor receptor. J Endocrinol 209:139–151
Kloss-Brandstätter A, Pacher D, Schönherr S, Weissensteiner H, Binna R, Specht G, Kronenberg F (2011) HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum Mutat 32:25–32
Kryazhimskiy S, Plotkin JB (2008) The population genetics of dN/dS. PLoS Genet 4:e1000304
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Lorenzo FR et al (2014) A genetic mechanism for Tibetan high-altitude adaptation. Nat Genet 46:951–956
Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24:133–141
Marrero P, Abu-Amero KK, Larruga JM, Cabrera VM (2016) Carriers of human mitochondrial DNA macrohaplogroup M colonized India from southeastern Asia. BMC Evol Biol 16:246
Metspalu M et al (2004) Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet 5:26
Mills RE et al (2011) Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 21:830–839
Nei M (1972) Genetic distance between populations. Am Nat 106:283–293
Oota H, Saitou N, Ueda S (2002) A large-scale analysis of human mitochondrial DNA sequences with special reference to the population history of East Eurasia. Anthropol Sci 110:293–312
Passarino G, Semino O, Bernini LF, Santachiara-Benerecetti AS (1996) Pre-caucasoid and caucasoid genetic features of the Indian population, revealed by mtDNA polymorphisms. Am J Hum Genet 59:927–934
Passarino G, Semino O, Quintana-Murci L, Excoffier L, Hammer M, Santachiara-Benerecetti AS (1998) Different genetic components in the Ethiopian population, identified by mtDNA and Y-chromosome polymorphisms. Am J Hum Genet 62:420–434
Patowary A et al (2012) Systematic analysis and functional annotation of variations in the genome of an Indian individual. Hum Mutat 33:1133–1140
Petousi N, Robbins PA (2014) Human adaptation to the hypoxia of high altitude: the Tibetan paradigm from the pregenomic to the postgenomic era. J Appl Physiol 116:875–884
Pineda-Tenor D, Garcia-Alvarez M, Jimenez-Sousa MA, Vazquez-Moron S, Resino S (2015) Relationship between ITPA polymorphisms and hemolytic anemia in HCV-infected patients after ribavirin-based therapy: a meta-analysis. J Transl Med 13:320
Pritchard JK (2011) Whole-genome sequencing data offer insights into human demography. Nat Genet 43:923–925
Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K, Santachiara-Benerecetti AS (1999) Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet 23:437–441
Simonson TS et al (2010) Genetic evidence for high-altitude adaptation in Tibet. Science 29:72–75
Srivastava S, Bhagi S, Kumari B, Chandra K, Sarkar S, Ashraf MZ (2011) Association of polymorphisms in angiotensin and aldosterone synthase genes of the renin-angiotensin-aldosterone system with high-altitude pulmonary edema. J Renin Angiotensin Aldosterone Syst 13:155–160
Tong P et al (2010) Sequencing and analysis of an Irish human genome. Genome Biol 11:R91
Tucker T, Marra M, Friedman JM (2009) Massively parallel sequencing: the next big thing in genetic medicine. Am J Hum Genet 85:142–154
Voelkerding KV, Dames SA, Durtschi JD (2009) Next-generation sequencing: from basic research to diagnostics. Clin Chem 55:641–658
Wetterbom A, Sevov M, Cavelier L, Bergstrom TF (2006) Comparative genomic analysis of human and chimpanzee indicates a key role for indels in primate evolution. J Mol Evol 63:682–690
Wong LP et al (2014) Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing. PLoS Genet 10(5):e1004377
Xiang K et al (2013) Identification of a Tibetan-specific mutation in hypoxic gene EGLN1 and its contribution to high-altitude adaptation. Mol Biol Evol 30:1889–1898
Acknowledgements
This research was funded by Defence Research and Development Organization at Defence Institute of Physiology and Allied Sciences by Project Number ST/14-15/DIP-265/2535/D(R&D) (subproject 7) to S Sarkar. Authors are grateful to Col Shashi Shukla for interaction for participants and logistic support. Authors also acknowledge Sucha Singh and Neha Thakur for providing assistance during the course of the study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Seema Malhotra, Sayar Singh and Soma Sarkar declare that they have no conflict of interest.
Ethical standards
All procedures performed in this study were in accordance with the ethical standards of the Institutional research committee of Defence Institute of Physiology and Allied Sciences and 1964 Helsinki declaration and its amendments.
Additional information
Data Availability The whole genome sequencing data of the present study is available in the NCBI data repository at Sequence Read Archive under accession number SRP071962 (WGS1A_High Altitude Native_Ladakh_Genomic DNA), SRP071962 (WGS2A_Sub mountainous_Kumaun_Genomic DNA) and SRP071962 (WGS3A_Sealevel_Telangana_Genomic DNA) at https://www.ncbi.nlm.nih.gov/sra/.
Dr. Soma Sarkar has retired.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Malhotra, S., Singh, S. & Sarkar, S. Whole genome variant analysis in three ethnically diverse Indians. Genes Genom 40, 497–510 (2018). https://doi.org/10.1007/s13258-018-0650-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13258-018-0650-z