Skip to main content

Advertisement

Log in

Sequencing and comparative genome analysis of three Indians

  • Published:
Mammalian Genome Aims and scope Submit manuscript

A Correction to this article was published on 20 June 2021

This article has been updated

Abstract

Remarkable advancement in DNA sequencing (NGS) technology has made personal genome analysis feasible and affordable. Here we present the whole genome sequencing and analysis of three individuals, two males and one female, from different parts of India. Comparison with the Reference Human Genome and the variant database showed a total of 4.0–4.85 million variants, primarily single nucleotide variants (SNVs), 350-600 K small insertions and deletions (INDELs), and previously unreported novel variants. The analysis of Y-chromosome and mitochondrial haplogroups revealed that the ancestors of the individual arrived on the subcontinent at very different times using distinctly different migration routes. Approximately, 500,000 novel SNPs and about 89,000 novel INDELs have been submitted to the NCBI as novel variants. PCA and Admix analysis revealed that the IHGP03, a Mizoram male from the Northeast region, is strikingly different from the other two Indian genomes. Collectively, the data suggest the complexity of the Indian population admix developed from several distinct waves of human migration over tens of thousands of years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Change history

  • 15 June 2021

    The original online version of this article was revised: The reference citations has been incorrectly placed under the introduction section instead of numerical bullet points. The numerical bullet points has been placed correctly now.

  • 20 June 2021

    A Correction to this paper has been published: https://doi.org/10.1007/s00335-021-09886-0

References

  • 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM et al (2010) A map of human genome variation from population-scale sequencing. Nature. 467:1061–1073

    Article  Google Scholar 

  • 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature. 491:56–65

    Article  Google Scholar 

  • Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249

    Article  CAS  Google Scholar 

  • Almal SH, Padh H (2015) Frequency distribution of autoimmunity associated FCGR3B gene copy number in Indian population. Int J Immunogenet 42:26–30

    Article  CAS  Google Scholar 

  • Almal S, Jeon S, Agarwal M, Patel Sweta, Patel S et al (2019) Sequencing and analysis of the whole genome of Indian Gujrati male. Genomics. 111(2):196–204

    Article  CAS  Google Scholar 

  • Altman RB (2007) PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat Genet 39:426

    Article  CAS  Google Scholar 

  • Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23:147–147

    Article  CAS  Google Scholar 

  • Bare LA, Morrison AC, Rowland CM, Shiffman D, Luke MM, Iakoubova OA et al (2007) Five common gene variants identify elevated genetic risk for coronary heart disease. Genet Med 9:682–689

    Article  CAS  Google Scholar 

  • Basu A, Sarkar-Roy N, Majumder PP (2016) Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc. Natl. Acad. Sci. U S A. 113:1594–1599

    Article  CAS  Google Scholar 

  • Cann RL, Stoneking M, Wilson AC (1987) Mitochondrial DNA and human evolution. Nature 325:31–36

    Article  CAS  Google Scholar 

  • Chambers JC, Abbott J, Zhang E, Turro E, Scott WR et al (2014) The South Asian Genome. PLOS One. https://doi.org/10.1371/journal.pone.0102645

    Article  PubMed  PubMed Central  Google Scholar 

  • Collet JP, Hulot JS, Pena A, Villard E, Esteve JB, Silvain J et al (2009) Cytochrome P450 2C19 polymorphism in young patients treated with clopidogrel after myocardial infarction: a cohort study. Lancet 373:309–317

    Article  CAS  Google Scholar 

  • Fan L, Yao YG (2011) MitoTool: a web server for the analysis and retrieval of human mitochondrial DNA sequence variations. Mitochondrion 11:351–356

    Article  CAS  Google Scholar 

  • Garin MC, James RW, Dussoix P, Blanché H, Passa P, Froguel P et al (1997) Paraoxonase polymorphism Met-Leu54 is associated with modified serum concentrations of the enzyme. A possible link between the paraoxonase gene and increased risk of cardiovascular disease in diabetes. J. Clin. Invest. 99:62–66

    Article  CAS  Google Scholar 

  • Hofmann S, Franke A, Fischer A, Jacobs G, Nothnagel M, Gaede KI et al (2008) Genome-wide association study identifies ANXA11 as a new susceptibility locus for sarcoidosis. Nat Genet 40:1103–1106

    Article  CAS  Google Scholar 

  • Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y et al (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951

    Article  CAS  Google Scholar 

  • Indian Genome Variation Consortium (2005) The Indian genome variation database (IGVdb): a project overview. Hum Genet 118:1–11

    Article  Google Scholar 

  • Indian Genome Variation Consortium (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20

    Article  Google Scholar 

  • Ingman M, Kaessmann H, PaÈaÈbo S, Gyllensten U (2000) Mitochondrial genome variation and the origin of modern humans. Nature 408:708–713

    Article  CAS  Google Scholar 

  • Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J et al (2001) International human genome sequencing consortium. Nature 409:860–921

    Article  CAS  Google Scholar 

  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760

    Article  CAS  Google Scholar 

  • Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079

    Article  Google Scholar 

  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303

    Article  CAS  Google Scholar 

  • Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814

    Article  CAS  Google Scholar 

  • Odawara M, Tachi Y, Yamashita K (1997) Paraoxonase polymorphism (Gln192-Arg) is associated with coronary heart disease in Japanese noninsulin-dependent diabetes mellitus. J Clin Endocrinol Metab 82:2257–2260

    Article  CAS  Google Scholar 

  • Olivieri A, Pala M, Gandini F, Kashani BH, Perego UA, Woodward SR et al (2013) Mitogenomes from two uncommon haplogroups mark late-glacial/postglacial expansions from the near east and neolithic dispersals within Europe. PLoS One. 8:e70492

    Article  CAS  Google Scholar 

  • Palanichamy M, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F et al (2004) Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet 75:966–978

    Article  CAS  Google Scholar 

  • Patel RK, Jain M (2012) 2012 NGS QC Toolkit: a toolkit for quality control of next-generation sequencing data. PLoS One. 7:e30619

    Article  CAS  Google Scholar 

  • Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, Gaikwad S et al (2006) A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Proc. Natl. Acad. Sci. U S A. 103:843–848

    Article  CAS  Google Scholar 

  • Sambrook J, Fritsch EF, Maniatis T (1989) Molecular Cloning: A Laboratory Manual, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY

    Google Scholar 

  • Serrato M, Marian AJ (1995) A variant of human paraoxonase/arylesterase (HUMPONA) gene is a risk factor for coronary artery disease. J Clin Invest 96:3005–3008

    Article  CAS  Google Scholar 

  • Sharma S, Rai E, Sharma P, Jena M, Singh S, Darvishi K et al (2009) The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system. J Hum Genet 54:47–55

    Article  CAS  Google Scholar 

  • Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452-457

    Article  CAS  Google Scholar 

  • Simon T, Verstuyft C, Mary-Krause M, Quteineh L, Drouet E, Méneveau N et al (2009) Genetic determinants of response to clopidogrel and cardiovascular events. N Engl J Med 360:363–375

    Article  CAS  Google Scholar 

  • Sinha S, Qidwai T, Kanchan K, Anand P, Jha GN, Pati SS et al (2008) Variations in host genes encoding adhesion molecules and susceptibility to falciparum malaria in India. Malar J 7:1

    Article  Google Scholar 

  • Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R et al (1996) Classification of European mtDNAs from an analysis of three European populations. Genetics 144:1835–1850

    Article  CAS  Google Scholar 

  • Wallace DC (1999) Mitochondrial diseases in man and mouse. Science 283:1482–1488

    Article  CAS  Google Scholar 

  • Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P et al (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:D668-672

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The author is indebted for the initial technical assistance from Suhani Almal, Milee Agarwal, Sweta Patel, and Shivangi Patel: from B. V. Patel PERD Center, Ahmedabad. For the initial analysis of the genomic data, the author is thankful to Kyusang Lee and Jong Bhak: from The Genomics Institute, Republic of Korea. The financial assistance from the Gujarat State Biotechnology Mission, Government of Gujarat is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harish Padh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (docx 168 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Padh, H. Sequencing and comparative genome analysis of three Indians. Mamm Genome 32, 401–412 (2021). https://doi.org/10.1007/s00335-021-09882-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00335-021-09882-4

Navigation