Abstract
Remarkable advancement in DNA sequencing (NGS) technology has made personal genome analysis feasible and affordable. Here we present the whole genome sequencing and analysis of three individuals, two males and one female, from different parts of India. Comparison with the Reference Human Genome and the variant database showed a total of 4.0–4.85 million variants, primarily single nucleotide variants (SNVs), 350-600 K small insertions and deletions (INDELs), and previously unreported novel variants. The analysis of Y-chromosome and mitochondrial haplogroups revealed that the ancestors of the individual arrived on the subcontinent at very different times using distinctly different migration routes. Approximately, 500,000 novel SNPs and about 89,000 novel INDELs have been submitted to the NCBI as novel variants. PCA and Admix analysis revealed that the IHGP03, a Mizoram male from the Northeast region, is strikingly different from the other two Indian genomes. Collectively, the data suggest the complexity of the Indian population admix developed from several distinct waves of human migration over tens of thousands of years.
Similar content being viewed by others
Change history
15 June 2021
The original online version of this article was revised: The reference citations has been incorrectly placed under the introduction section instead of numerical bullet points. The numerical bullet points has been placed correctly now.
20 June 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00335-021-09886-0
References
1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM et al (2010) A map of human genome variation from population-scale sequencing. Nature. 467:1061–1073
1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature. 491:56–65
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249
Almal SH, Padh H (2015) Frequency distribution of autoimmunity associated FCGR3B gene copy number in Indian population. Int J Immunogenet 42:26–30
Almal S, Jeon S, Agarwal M, Patel Sweta, Patel S et al (2019) Sequencing and analysis of the whole genome of Indian Gujrati male. Genomics. 111(2):196–204
Altman RB (2007) PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat Genet 39:426
Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23:147–147
Bare LA, Morrison AC, Rowland CM, Shiffman D, Luke MM, Iakoubova OA et al (2007) Five common gene variants identify elevated genetic risk for coronary heart disease. Genet Med 9:682–689
Basu A, Sarkar-Roy N, Majumder PP (2016) Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc. Natl. Acad. Sci. U S A. 113:1594–1599
Cann RL, Stoneking M, Wilson AC (1987) Mitochondrial DNA and human evolution. Nature 325:31–36
Chambers JC, Abbott J, Zhang E, Turro E, Scott WR et al (2014) The South Asian Genome. PLOS One. https://doi.org/10.1371/journal.pone.0102645
Collet JP, Hulot JS, Pena A, Villard E, Esteve JB, Silvain J et al (2009) Cytochrome P450 2C19 polymorphism in young patients treated with clopidogrel after myocardial infarction: a cohort study. Lancet 373:309–317
Fan L, Yao YG (2011) MitoTool: a web server for the analysis and retrieval of human mitochondrial DNA sequence variations. Mitochondrion 11:351–356
Garin MC, James RW, Dussoix P, Blanché H, Passa P, Froguel P et al (1997) Paraoxonase polymorphism Met-Leu54 is associated with modified serum concentrations of the enzyme. A possible link between the paraoxonase gene and increased risk of cardiovascular disease in diabetes. J. Clin. Invest. 99:62–66
Hofmann S, Franke A, Fischer A, Jacobs G, Nothnagel M, Gaede KI et al (2008) Genome-wide association study identifies ANXA11 as a new susceptibility locus for sarcoidosis. Nat Genet 40:1103–1106
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y et al (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951
Indian Genome Variation Consortium (2005) The Indian genome variation database (IGVdb): a project overview. Hum Genet 118:1–11
Indian Genome Variation Consortium (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20
Ingman M, Kaessmann H, PaÈaÈbo S, Gyllensten U (2000) Mitochondrial genome variation and the origin of modern humans. Nature 408:708–713
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J et al (2001) International human genome sequencing consortium. Nature 409:860–921
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814
Odawara M, Tachi Y, Yamashita K (1997) Paraoxonase polymorphism (Gln192-Arg) is associated with coronary heart disease in Japanese noninsulin-dependent diabetes mellitus. J Clin Endocrinol Metab 82:2257–2260
Olivieri A, Pala M, Gandini F, Kashani BH, Perego UA, Woodward SR et al (2013) Mitogenomes from two uncommon haplogroups mark late-glacial/postglacial expansions from the near east and neolithic dispersals within Europe. PLoS One. 8:e70492
Palanichamy M, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F et al (2004) Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet 75:966–978
Patel RK, Jain M (2012) 2012 NGS QC Toolkit: a toolkit for quality control of next-generation sequencing data. PLoS One. 7:e30619
Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, Gaikwad S et al (2006) A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Proc. Natl. Acad. Sci. U S A. 103:843–848
Sambrook J, Fritsch EF, Maniatis T (1989) Molecular Cloning: A Laboratory Manual, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
Serrato M, Marian AJ (1995) A variant of human paraoxonase/arylesterase (HUMPONA) gene is a risk factor for coronary artery disease. J Clin Invest 96:3005–3008
Sharma S, Rai E, Sharma P, Jena M, Singh S, Darvishi K et al (2009) The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system. J Hum Genet 54:47–55
Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452-457
Simon T, Verstuyft C, Mary-Krause M, Quteineh L, Drouet E, Méneveau N et al (2009) Genetic determinants of response to clopidogrel and cardiovascular events. N Engl J Med 360:363–375
Sinha S, Qidwai T, Kanchan K, Anand P, Jha GN, Pati SS et al (2008) Variations in host genes encoding adhesion molecules and susceptibility to falciparum malaria in India. Malar J 7:1
Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R et al (1996) Classification of European mtDNAs from an analysis of three European populations. Genetics 144:1835–1850
Wallace DC (1999) Mitochondrial diseases in man and mouse. Science 283:1482–1488
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P et al (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:D668-672
Acknowledgements
The author is indebted for the initial technical assistance from Suhani Almal, Milee Agarwal, Sweta Patel, and Shivangi Patel: from B. V. Patel PERD Center, Ahmedabad. For the initial analysis of the genomic data, the author is thankful to Kyusang Lee and Jong Bhak: from The Genomics Institute, Republic of Korea. The financial assistance from the Gujarat State Biotechnology Mission, Government of Gujarat is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Padh, H. Sequencing and comparative genome analysis of three Indians. Mamm Genome 32, 401–412 (2021). https://doi.org/10.1007/s00335-021-09882-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00335-021-09882-4