Sequencing and comparative genome analysis of three Indians

Padh, Harish

doi:10.1007/s00335-021-09882-4

Sequencing and comparative genome analysis of three Indians

Published: 04 June 2021

Volume 32, pages 401–412, (2021)
Cite this article

Mammalian Genome Aims and scope Submit manuscript

Harish Padh ORCID: orcid.org/0000-0003-2305-6229¹

251 Accesses
2 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 20 June 2021

This article has been updated

Abstract

Remarkable advancement in DNA sequencing (NGS) technology has made personal genome analysis feasible and affordable. Here we present the whole genome sequencing and analysis of three individuals, two males and one female, from different parts of India. Comparison with the Reference Human Genome and the variant database showed a total of 4.0–4.85 million variants, primarily single nucleotide variants (SNVs), 350-600 K small insertions and deletions (INDELs), and previously unreported novel variants. The analysis of Y-chromosome and mitochondrial haplogroups revealed that the ancestors of the individual arrived on the subcontinent at very different times using distinctly different migration routes. Approximately, 500,000 novel SNPs and about 89,000 novel INDELs have been submitted to the NCBI as novel variants. PCA and Admix analysis revealed that the IHGP03, a Mizoram male from the Northeast region, is strikingly different from the other two Indian genomes. Collectively, the data suggest the complexity of the Indian population admix developed from several distinct waves of human migration over tens of thousands of years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Whole genome sequencing data of multiple individuals of Pakistani descent

Article Open access 13 October 2020

Identification of single nucleotide variants in the Moroccan population by whole-genome sequencing

Article Open access 21 September 2020

Whole genome sequencing data for two individuals of Pakistani descent

Article Open access 11 September 2018

Change history

15 June 2021
The original online version of this article was revised: The reference citations has been incorrectly placed under the introduction section instead of numerical bullet points. The numerical bullet points has been placed correctly now.
20 June 2021
A Correction to this paper has been published: https://doi.org/10.1007/s00335-021-09886-0

References

1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM et al (2010) A map of human genome variation from population-scale sequencing. Nature. 467:1061–1073
Article Google Scholar
1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature. 491:56–65
Article Google Scholar
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249
Article CAS Google Scholar
Almal SH, Padh H (2015) Frequency distribution of autoimmunity associated FCGR3B gene copy number in Indian population. Int J Immunogenet 42:26–30
Article CAS Google Scholar
Almal S, Jeon S, Agarwal M, Patel Sweta, Patel S et al (2019) Sequencing and analysis of the whole genome of Indian Gujrati male. Genomics. 111(2):196–204
Article CAS Google Scholar
Altman RB (2007) PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat Genet 39:426
Article CAS Google Scholar
Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23:147–147
Article CAS Google Scholar
Bare LA, Morrison AC, Rowland CM, Shiffman D, Luke MM, Iakoubova OA et al (2007) Five common gene variants identify elevated genetic risk for coronary heart disease. Genet Med 9:682–689
Article CAS Google Scholar
Basu A, Sarkar-Roy N, Majumder PP (2016) Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. Proc. Natl. Acad. Sci. U S A. 113:1594–1599
Article CAS Google Scholar
Cann RL, Stoneking M, Wilson AC (1987) Mitochondrial DNA and human evolution. Nature 325:31–36
Article CAS Google Scholar
Chambers JC, Abbott J, Zhang E, Turro E, Scott WR et al (2014) The South Asian Genome. PLOS One. https://doi.org/10.1371/journal.pone.0102645
Article PubMed PubMed Central Google Scholar
Collet JP, Hulot JS, Pena A, Villard E, Esteve JB, Silvain J et al (2009) Cytochrome P450 2C19 polymorphism in young patients treated with clopidogrel after myocardial infarction: a cohort study. Lancet 373:309–317
Article CAS Google Scholar
Fan L, Yao YG (2011) MitoTool: a web server for the analysis and retrieval of human mitochondrial DNA sequence variations. Mitochondrion 11:351–356
Article CAS Google Scholar
Garin MC, James RW, Dussoix P, Blanché H, Passa P, Froguel P et al (1997) Paraoxonase polymorphism Met-Leu54 is associated with modified serum concentrations of the enzyme. A possible link between the paraoxonase gene and increased risk of cardiovascular disease in diabetes. J. Clin. Invest. 99:62–66
Article CAS Google Scholar
Hofmann S, Franke A, Fischer A, Jacobs G, Nothnagel M, Gaede KI et al (2008) Genome-wide association study identifies ANXA11 as a new susceptibility locus for sarcoidosis. Nat Genet 40:1103–1106
Article CAS Google Scholar
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y et al (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951
Article CAS Google Scholar
Indian Genome Variation Consortium (2005) The Indian genome variation database (IGVdb): a project overview. Hum Genet 118:1–11
Article Google Scholar
Indian Genome Variation Consortium (2008) Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet 87:3–20
Article Google Scholar
Ingman M, Kaessmann H, PaÈaÈbo S, Gyllensten U (2000) Mitochondrial genome variation and the origin of modern humans. Nature 408:708–713
Article CAS Google Scholar
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J et al (2001) International human genome sequencing consortium. Nature 409:860–921
Article CAS Google Scholar
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Article CAS Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079
Article Google Scholar
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Article CAS Google Scholar
Ng PC, Henikoff S (2003) SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814
Article CAS Google Scholar
Odawara M, Tachi Y, Yamashita K (1997) Paraoxonase polymorphism (Gln192-Arg) is associated with coronary heart disease in Japanese noninsulin-dependent diabetes mellitus. J Clin Endocrinol Metab 82:2257–2260
Article CAS Google Scholar
Olivieri A, Pala M, Gandini F, Kashani BH, Perego UA, Woodward SR et al (2013) Mitogenomes from two uncommon haplogroups mark late-glacial/postglacial expansions from the near east and neolithic dispersals within Europe. PLoS One. 8:e70492
Article CAS Google Scholar
Palanichamy M, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F et al (2004) Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet 75:966–978
Article CAS Google Scholar
Patel RK, Jain M (2012) 2012 NGS QC Toolkit: a toolkit for quality control of next-generation sequencing data. PLoS One. 7:e30619
Article CAS Google Scholar
Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, Gaikwad S et al (2006) A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Proc. Natl. Acad. Sci. U S A. 103:843–848
Article CAS Google Scholar
Sambrook J, Fritsch EF, Maniatis T (1989) Molecular Cloning: A Laboratory Manual, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
Google Scholar
Serrato M, Marian AJ (1995) A variant of human paraoxonase/arylesterase (HUMPONA) gene is a risk factor for coronary artery disease. J Clin Invest 96:3005–3008
Article CAS Google Scholar
Sharma S, Rai E, Sharma P, Jena M, Singh S, Darvishi K et al (2009) The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system. J Hum Genet 54:47–55
Article CAS Google Scholar
Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (2012) SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 40:W452-457
Article CAS Google Scholar
Simon T, Verstuyft C, Mary-Krause M, Quteineh L, Drouet E, Méneveau N et al (2009) Genetic determinants of response to clopidogrel and cardiovascular events. N Engl J Med 360:363–375
Article CAS Google Scholar
Sinha S, Qidwai T, Kanchan K, Anand P, Jha GN, Pati SS et al (2008) Variations in host genes encoding adhesion molecules and susceptibility to falciparum malaria in India. Malar J 7:1
Article Google Scholar
Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R et al (1996) Classification of European mtDNAs from an analysis of three European populations. Genetics 144:1835–1850
Article CAS Google Scholar
Wallace DC (1999) Mitochondrial diseases in man and mouse. Science 283:1482–1488
Article CAS Google Scholar
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P et al (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:D668-672
Article CAS Google Scholar

Download references

Acknowledgements

The author is indebted for the initial technical assistance from Suhani Almal, Milee Agarwal, Sweta Patel, and Shivangi Patel: from B. V. Patel PERD Center, Ahmedabad. For the initial analysis of the genomic data, the author is thankful to Kyusang Lee and Jong Bhak: from The Genomics Institute, Republic of Korea. The financial assistance from the Gujarat State Biotechnology Mission, Government of Gujarat is gratefully acknowledged.

Author information

Authors and Affiliations

Former Vice-Chancellor, Sardar Patel University, Vallabh Vidyanagar, Gujarat, 388120, India
Harish Padh

Authors

Harish Padh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harish Padh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (docx 168 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Padh, H. Sequencing and comparative genome analysis of three Indians. Mamm Genome 32, 401–412 (2021). https://doi.org/10.1007/s00335-021-09882-4

Download citation

Received: 27 January 2021
Accepted: 26 May 2021
Published: 04 June 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s00335-021-09882-4

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequencing and comparative genome analysis of three Indians

Abstract

Access this article

Similar content being viewed by others

Whole genome sequencing data of multiple individuals of Pakistani descent

Identification of single nucleotide variants in the Moroccan population by whole-genome sequencing

Whole genome sequencing data for two individuals of Pakistani descent

Change history

15 June 2021

20 June 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (docx 168 kb)

Rights and permissions

About this article

Cite this article

Navigation

Sequencing and comparative genome analysis of three Indians

Abstract

Access this article

Similar content being viewed by others

Whole genome sequencing data of multiple individuals of Pakistani descent

Identification of single nucleotide variants in the Moroccan population by whole-genome sequencing

Whole genome sequencing data for two individuals of Pakistani descent

Change history

15 June 2021

20 June 2021

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (docx 168 kb)

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation