Skip to main content

Bioinformatics in Gene and Genome Analysis

  • Chapter
  • First Online:
Advances in Bioinformatics

Abstract

Gene and genome analysis play important roles in molecular biology research and individualized medicine. Thanks to the development of sequencing techniques, sequencing data is getting more and more abundant, which requires bioinformatic tools to handle. As a combination of computational methods, statistics, and molecular biology, bioinformatics is a bridge between sequencing data and clinical interpretation. Via a half of decade development, bioinformatics has obtained novel achievements in data storage, assembly’s speed and accuracy, variant identification, and friendly-to-user interfaces. In this chapter, we focus on the history and development of bioinformatics as well as introduced the principles and several popular computational tools for each step in the workflow of gene and genome analysis, including data generation, genome assembly, annotation, comparative analysis, variant calling, and finally interpretation. Since the genomes of prokaryotes are distinguished from eukaryotes, we also mentioned the differences in the data process between humans as well as animals and microorganisms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Abbreviations

BAM:

Binary SAM

DNA:

Deoxyribose nucleic acid

KEGG:

Kyoto Encyclopedia of Genes and Genomes

NGS:

Next generation sequencing

RNA:

Ribosenucleic acid

SAM:

Sequence Alignment/Map

SNPs:

Single nucleotide polymorphisms

VCF:

Variant call format

References

  • Abou Ziki MD, Mani A (2016) Metabolic syndrome: genetic insights into disease pathogenesis. Curr Opin Lipidol 27(2):162–171

    Article  CAS  Google Scholar 

  • Akalın PK (2006) Introduction to bioinformatics. Mol Nutr Food Res 50(7):610–619

    Article  Google Scholar 

  • Aly SM, Aldeyarbi H (2020) Applications of forensic entomology: overview and update. Arch Med Sadowej Kryminol 70(1):44–77

    Google Scholar 

  • Apolinario E et al (1993) Cloning and manipulation of the Schizosaccharomyces pombe his7+ gene as a new selectable marker for molecular genetic studies. Curr Genet 24(6):491–495

    Article  CAS  PubMed Central  Google Scholar 

  • Armstrong J et al (2019) Whole-genome alignment and comparative annotation. Annu Rev Anim Biosci 7:41–64

    Article  CAS  Google Scholar 

  • Bankevich A et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477

    Article  CAS  PubMed Central  Google Scholar 

  • Benson DA et al (2013) GenBank. Nucleic Acids Res 41(D1):D36–D42

    Article  CAS  Google Scholar 

  • Birney E et al (2004) An overview of Ensembl. Genome Res 14(5):925–928

    Article  CAS  PubMed Central  Google Scholar 

  • Bluestone JA, Herold K, Eisenbarth G (2010) Genetics, pathogenesis and clinical interventions in type 1 diabetes. Nature 464(7293):1293–1300

    Article  CAS  PubMed Central  Google Scholar 

  • Bowdin S et al (2016) Recommendations for the integration of genomics into clinical practice. Genet Med 18(11):1075–1084

    Article  CAS  PubMed Central  Google Scholar 

  • Brunet MA, Leblanc S, Roucou X (2022) OpenVar: functional annotation of variants in non-canonical open reading frames. Cell Biosci 12(1):130

    Article  CAS  PubMed Central  Google Scholar 

  • Butkiewicz M, Bush WS (2016) In silico functional annotation of genomic variation. Curr Protoc Hum Genet 88:6.15.1–6.15.17

    Google Scholar 

  • Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421

    Article  PubMed Central  Google Scholar 

  • Can T (2014) Introduction to bioinformatics. In: Yousef M, Allmer J (eds) miRNomics: microRNA biology and computational analysis. Humana Press, Totowa, pp 51–71

    Chapter  Google Scholar 

  • Cantacessi C et al (2010) A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing. Nucleic Acids Res 38(17):e171

    Article  PubMed Central  Google Scholar 

  • Caspi R et al (2020) The MetaCyc database of metabolic pathways and enzymes—a 2019 update. Nucleic Acids Res 48(D1):D445–D453

    Article  CAS  Google Scholar 

  • Catozzi S et al (2022) Reconstruction and analysis of a large-scale binary Ras-effector signaling network. Cell Commun Signal 20(1):24

    Article  CAS  PubMed Central  Google Scholar 

  • Cole SR, Chu H, Greenland S (2014) Maximum likelihood, profile likelihood, and penalized likelihood: a primer. Am J Epidemiol 179(2):252–260

    Article  Google Scholar 

  • Crews KR et al (2012) Pharmacogenomics and individualized medicine: translating science into practice. Clin Pharmacol Ther 92(4):467–475

    CAS  Google Scholar 

  • Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158

    Article  CAS  PubMed Central  Google Scholar 

  • Daoud M, Mayo M (2019) A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 97:204–214

    Article  Google Scholar 

  • Dayhoff MO, Eck RV (1972) Atlas of protein sequence and structure. National Biomedical Research Foundation

    Google Scholar 

  • Diniz WJ, Canduri F (2017) REVIEW-ARTICLE bioinformatics: an overview and its applications. Genet Mol Res 16(1)

    Google Scholar 

  • Endrullat C et al (2016) Standardization and quality management in next-generation sequencing. Appl Transl Genomics 10:2–9

    Article  Google Scholar 

  • Fabregat A et al (2016) The reactome pathway knowledgebase. Nucleic Acids Res 44(D1):D481–D487

    Article  CAS  Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376

    Article  CAS  Google Scholar 

  • Filippakopoulos P et al (2012) Histone recognition and large-scale structural analysis of the human bromodomain family. Cell 149(1):214–231

    Article  CAS  PubMed Central  Google Scholar 

  • Fleischmann RD et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512

    Article  CAS  Google Scholar 

  • Gauthier J et al (2019) A brief history of bioinformatics. Brief Bioinform 20(6):1981–1996

    Article  Google Scholar 

  • Ghurye JS, Cepeda-Espinoza V, Pop M (2016) Metagenomic assembly: overview, challenges and applications. Yale J Biol Med 89(3):353–362

    CAS  PubMed Central  Google Scholar 

  • Haeckel E (1866) Generelle morphologie der organismen, vol 2. Georg Reimer, Berlin

    Book  Google Scholar 

  • Hesper B, Hogeweg PDLLBC (1970) Bioinformatica: een werkconcept. Kameleon 1(6):28–29

    Google Scholar 

  • Hu T et al (2021) Next-generation sequencing technologies: an overview. Hum Immunol 82(11):801–811

    Article  CAS  Google Scholar 

  • Jalili V et al (2020) The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Res 48(W1):W395–W402

    Article  CAS  PubMed Central  Google Scholar 

  • Jeong Y, Choi J, Lee KH (2014) Technology advancement for integrative stem cell analyses. Tissue Eng Part B Rev 20(6):669–682

    Article  PubMed Central  Google Scholar 

  • Kanehisa M (2002) The KEGG database. In: ‘In silico’ simulation of biological processes, pp 91–103

    Google Scholar 

  • Kanz C et al (2005) The EMBL nucleotide sequence database. Nucleic Acids Res 33(suppl_1):D29–D33

    CAS  Google Scholar 

  • Karlsson M et al (2015) Insights on the evolution of mycoparasitism from the genome of Clonostachys rosea. Genome Biol Evol 7(2):465–480

    Article  CAS  PubMed Central  Google Scholar 

  • Köhler S et al (2017) The human phenotype ontology in 2017. Nucleic Acids Res 45(D1):D865–D876

    Article  Google Scholar 

  • Kumar R, Gupta M, Sarwat M (2022) Bioinformatics in drug design and delivery. In: Saharan VA (ed) Computer aided pharmaceutics and drug delivery: an application guide for students and researchers of pharmaceutical sciences. Springer Nature Singapore, Singapore, pp 641–664

    Chapter  Google Scholar 

  • Lazaridis KN et al (2014) Implementing individualized medicine into the medical practice. Am J Med Genet C Semin Med Genet 166(1):15–23

    Article  Google Scholar 

  • Lelieveld SH, Veltman JA, Gilissen C (2016) Novel bioinformatic developments for exome sequencing. Hum Genet 135(6):603–614

    Article  CAS  PubMed Central  Google Scholar 

  • Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079

    Article  PubMed Central  Google Scholar 

  • Ma'ayan A (2008) Network integration and graph analysis in mammalian molecular systems biology. IET Syst Biol 2(5):206–221

    Article  CAS  Google Scholar 

  • Marco-Puche G et al (2019) RNA-Seq perspectives to improve clinical diagnosis. Front Genet 10:1152

    Article  CAS  PubMed Central  Google Scholar 

  • Massey SE (2016) Comparative microbial genomics and forensics. Microbiol Spectr 4(4)

    Google Scholar 

  • McKenna A et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303

    Article  CAS  PubMed Central  Google Scholar 

  • Olson MV (1993) The human genome project. Proc Natl Acad Sci U S A 90(10):4338–4344

    Article  CAS  PubMed Central  Google Scholar 

  • Pereira R, Oliveira J, Sousa M (2020) Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics. J Clin Med 9:132. https://doi.org/10.3390/jcm9010132

    Article  CAS  PubMed Central  Google Scholar 

  • Peter JAC et al. (2015) SAM/BAM format v1.5 extensions for de novo assemblies. bioRxiv, pp 020024

    Google Scholar 

  • Peterson MW, Colosimo ME (2007) TreeViewJ: an application for viewing and analyzing phylogenetic trees. Source Code Biol Med 2(1):7

    Article  PubMed Central  Google Scholar 

  • Rentzsch P et al (2019) CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 47(D1):D886–D894

    Article  CAS  Google Scholar 

  • Robinson JT et al (2017) Variant review with the integrative genomics viewer. Cancer Res 77(21):e31–e34

    Article  CAS  PubMed Central  Google Scholar 

  • Sam B, Patrick ST (2013) What is next generation sequencing? Arch Dis Child Educ Pract Ed 98(6):236

    Article  Google Scholar 

  • Sargent L et al (2020) G-OnRamp: generating genome browsers to facilitate undergraduate-driven collaborative genome annotation. PLoS Comput Biol 16(6):e1007863

    Article  CAS  PubMed Central  Google Scholar 

  • Schochetman G, Ou C-Y, Jones WK (1988) Polymerase chain reaction. J Infect Dis 158(6):1154–1157

    Article  CAS  Google Scholar 

  • Shannon P et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504

    Article  CAS  PubMed Central  Google Scholar 

  • Sohpal VK, Dey A, Singh A (2010) MEGA biocentric software for sequence and phylogenetic analysis: a review. Int J Bioinforma Res Appl 6(3):230–240

    Article  Google Scholar 

  • Staden R (1979) A strategy of DNA sequencing employing computer programs. Nucleic Acids Res 6(7):2601–2610

    Article  CAS  PubMed Central  Google Scholar 

  • Stallman R (2003) Free software foundation (FSF). In: Encyclopedia of computer science. Wiley, Hoboken, pp 732–733

    Google Scholar 

  • Sunyaev SR, Roth FP (2013) Systems biology and the analysis of genetic variation. Curr Opin Genet Dev 23(6):599–601

    Article  CAS  PubMed Central  Google Scholar 

  • Tamura K, Stecher G, Kumar S (2021) MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol 38(7):3022–3027

    Article  CAS  PubMed Central  Google Scholar 

  • Tateno Y et al (2002) DNA data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res 30(1):27–30

    Article  CAS  PubMed Central  Google Scholar 

  • Wang KC, Chang HY (2018) Epigenomics: technologies and applications. Circ Res 122(9):1191–1199

    Article  CAS  PubMed Central  Google Scholar 

  • Wang X, Ghosh S, Guo S-W (2001) Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res 29(15):e75

    Article  CAS  PubMed Central  Google Scholar 

  • Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38(16):e164

    Article  PubMed Central  Google Scholar 

  • Wang G et al (2020) The characteristic of virulence, biofilm and antibiotic resistance of Klebsiella pneumoniae. Int J Environ Res Public Health 17:6278. https://doi.org/10.3390/ijerph17176278

    Article  CAS  PubMed Central  Google Scholar 

  • Womble DD (1999) GCG. In: Misener S, Krawetz SA (eds) Bioinformatics methods and protocols. Humana Press, Totowa, pp 3–22

    Chapter  Google Scholar 

  • Xia Z et al (2022) A review of parallel implementations for the Smith-Waterman algorithm. Interdiscip Sci 14(1):1–14

    Article  Google Scholar 

  • Zhao C, Sahni S (2019) String correction using the Damerau-Levenshtein distance. BMC Bioinformatics 20(Suppl 11):277

    Article  PubMed Central  Google Scholar 

  • Zhou P et al (1994) A system for gene cloning and manipulation in the yeast Candida glabrata. Gene 142(1):135–140

    Article  CAS  Google Scholar 

  • Zhu F et al (2020) Metagenome-wide association of gut microbiome features for schizophrenia. Nat Commun 11(1):1612

    Article  CAS  PubMed Central  Google Scholar 

  • Zverinova S, Guryev V (2022) Variant calling: considerations, practices, and developments. Hum Mutat 43(8):976–985

    Article  Google Scholar 

  • Zweig AS et al (2008) UCSC genome browser tutorial. Genomics 92(2):75–84

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinh-Toi Chu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Le Bui, N., Do, VQ., Chu, DT. (2024). Bioinformatics in Gene and Genome Analysis. In: Singh, V., Kumar, A. (eds) Advances in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-99-8401-5_4

Download citation

Publish with us

Policies and ethics