Abstract
Gene and genome analysis play important roles in molecular biology research and individualized medicine. Thanks to the development of sequencing techniques, sequencing data is getting more and more abundant, which requires bioinformatic tools to handle. As a combination of computational methods, statistics, and molecular biology, bioinformatics is a bridge between sequencing data and clinical interpretation. Via a half of decade development, bioinformatics has obtained novel achievements in data storage, assembly’s speed and accuracy, variant identification, and friendly-to-user interfaces. In this chapter, we focus on the history and development of bioinformatics as well as introduced the principles and several popular computational tools for each step in the workflow of gene and genome analysis, including data generation, genome assembly, annotation, comparative analysis, variant calling, and finally interpretation. Since the genomes of prokaryotes are distinguished from eukaryotes, we also mentioned the differences in the data process between humans as well as animals and microorganisms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Abbreviations
- BAM:
-
Binary SAM
- DNA:
-
Deoxyribose nucleic acid
- KEGG:
-
Kyoto Encyclopedia of Genes and Genomes
- NGS:
-
Next generation sequencing
- RNA:
-
Ribosenucleic acid
- SAM:
-
Sequence Alignment/Map
- SNPs:
-
Single nucleotide polymorphisms
- VCF:
-
Variant call format
References
Abou Ziki MD, Mani A (2016) Metabolic syndrome: genetic insights into disease pathogenesis. Curr Opin Lipidol 27(2):162–171
Akalın PK (2006) Introduction to bioinformatics. Mol Nutr Food Res 50(7):610–619
Aly SM, Aldeyarbi H (2020) Applications of forensic entomology: overview and update. Arch Med Sadowej Kryminol 70(1):44–77
Apolinario E et al (1993) Cloning and manipulation of the Schizosaccharomyces pombe his7+ gene as a new selectable marker for molecular genetic studies. Curr Genet 24(6):491–495
Armstrong J et al (2019) Whole-genome alignment and comparative annotation. Annu Rev Anim Biosci 7:41–64
Bankevich A et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477
Benson DA et al (2013) GenBank. Nucleic Acids Res 41(D1):D36–D42
Birney E et al (2004) An overview of Ensembl. Genome Res 14(5):925–928
Bluestone JA, Herold K, Eisenbarth G (2010) Genetics, pathogenesis and clinical interventions in type 1 diabetes. Nature 464(7293):1293–1300
Bowdin S et al (2016) Recommendations for the integration of genomics into clinical practice. Genet Med 18(11):1075–1084
Brunet MA, Leblanc S, Roucou X (2022) OpenVar: functional annotation of variants in non-canonical open reading frames. Cell Biosci 12(1):130
Butkiewicz M, Bush WS (2016) In silico functional annotation of genomic variation. Curr Protoc Hum Genet 88:6.15.1–6.15.17
Camacho C et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421
Can T (2014) Introduction to bioinformatics. In: Yousef M, Allmer J (eds) miRNomics: microRNA biology and computational analysis. Humana Press, Totowa, pp 51–71
Cantacessi C et al (2010) A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing. Nucleic Acids Res 38(17):e171
Caspi R et al (2020) The MetaCyc database of metabolic pathways and enzymes—a 2019 update. Nucleic Acids Res 48(D1):D445–D453
Catozzi S et al (2022) Reconstruction and analysis of a large-scale binary Ras-effector signaling network. Cell Commun Signal 20(1):24
Cole SR, Chu H, Greenland S (2014) Maximum likelihood, profile likelihood, and penalized likelihood: a primer. Am J Epidemiol 179(2):252–260
Crews KR et al (2012) Pharmacogenomics and individualized medicine: translating science into practice. Clin Pharmacol Ther 92(4):467–475
Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158
Daoud M, Mayo M (2019) A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 97:204–214
Dayhoff MO, Eck RV (1972) Atlas of protein sequence and structure. National Biomedical Research Foundation
Diniz WJ, Canduri F (2017) REVIEW-ARTICLE bioinformatics: an overview and its applications. Genet Mol Res 16(1)
Endrullat C et al (2016) Standardization and quality management in next-generation sequencing. Appl Transl Genomics 10:2–9
Fabregat A et al (2016) The reactome pathway knowledgebase. Nucleic Acids Res 44(D1):D481–D487
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17(6):368–376
Filippakopoulos P et al (2012) Histone recognition and large-scale structural analysis of the human bromodomain family. Cell 149(1):214–231
Fleischmann RD et al (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269(5223):496–512
Gauthier J et al (2019) A brief history of bioinformatics. Brief Bioinform 20(6):1981–1996
Ghurye JS, Cepeda-Espinoza V, Pop M (2016) Metagenomic assembly: overview, challenges and applications. Yale J Biol Med 89(3):353–362
Haeckel E (1866) Generelle morphologie der organismen, vol 2. Georg Reimer, Berlin
Hesper B, Hogeweg PDLLBC (1970) Bioinformatica: een werkconcept. Kameleon 1(6):28–29
Hu T et al (2021) Next-generation sequencing technologies: an overview. Hum Immunol 82(11):801–811
Jalili V et al (2020) The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Res 48(W1):W395–W402
Jeong Y, Choi J, Lee KH (2014) Technology advancement for integrative stem cell analyses. Tissue Eng Part B Rev 20(6):669–682
Kanehisa M (2002) The KEGG database. In: ‘In silico’ simulation of biological processes, pp 91–103
Kanz C et al (2005) The EMBL nucleotide sequence database. Nucleic Acids Res 33(suppl_1):D29–D33
Karlsson M et al (2015) Insights on the evolution of mycoparasitism from the genome of Clonostachys rosea. Genome Biol Evol 7(2):465–480
Köhler S et al (2017) The human phenotype ontology in 2017. Nucleic Acids Res 45(D1):D865–D876
Kumar R, Gupta M, Sarwat M (2022) Bioinformatics in drug design and delivery. In: Saharan VA (ed) Computer aided pharmaceutics and drug delivery: an application guide for students and researchers of pharmaceutical sciences. Springer Nature Singapore, Singapore, pp 641–664
Lazaridis KN et al (2014) Implementing individualized medicine into the medical practice. Am J Med Genet C Semin Med Genet 166(1):15–23
Lelieveld SH, Veltman JA, Gilissen C (2016) Novel bioinformatic developments for exome sequencing. Hum Genet 135(6):603–614
Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079
Ma'ayan A (2008) Network integration and graph analysis in mammalian molecular systems biology. IET Syst Biol 2(5):206–221
Marco-Puche G et al (2019) RNA-Seq perspectives to improve clinical diagnosis. Front Genet 10:1152
Massey SE (2016) Comparative microbial genomics and forensics. Microbiol Spectr 4(4)
McKenna A et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303
Olson MV (1993) The human genome project. Proc Natl Acad Sci U S A 90(10):4338–4344
Pereira R, Oliveira J, Sousa M (2020) Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics. J Clin Med 9:132. https://doi.org/10.3390/jcm9010132
Peter JAC et al. (2015) SAM/BAM format v1.5 extensions for de novo assemblies. bioRxiv, pp 020024
Peterson MW, Colosimo ME (2007) TreeViewJ: an application for viewing and analyzing phylogenetic trees. Source Code Biol Med 2(1):7
Rentzsch P et al (2019) CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 47(D1):D886–D894
Robinson JT et al (2017) Variant review with the integrative genomics viewer. Cancer Res 77(21):e31–e34
Sam B, Patrick ST (2013) What is next generation sequencing? Arch Dis Child Educ Pract Ed 98(6):236
Sargent L et al (2020) G-OnRamp: generating genome browsers to facilitate undergraduate-driven collaborative genome annotation. PLoS Comput Biol 16(6):e1007863
Schochetman G, Ou C-Y, Jones WK (1988) Polymerase chain reaction. J Infect Dis 158(6):1154–1157
Shannon P et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504
Sohpal VK, Dey A, Singh A (2010) MEGA biocentric software for sequence and phylogenetic analysis: a review. Int J Bioinforma Res Appl 6(3):230–240
Staden R (1979) A strategy of DNA sequencing employing computer programs. Nucleic Acids Res 6(7):2601–2610
Stallman R (2003) Free software foundation (FSF). In: Encyclopedia of computer science. Wiley, Hoboken, pp 732–733
Sunyaev SR, Roth FP (2013) Systems biology and the analysis of genetic variation. Curr Opin Genet Dev 23(6):599–601
Tamura K, Stecher G, Kumar S (2021) MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol 38(7):3022–3027
Tateno Y et al (2002) DNA data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res 30(1):27–30
Wang KC, Chang HY (2018) Epigenomics: technologies and applications. Circ Res 122(9):1191–1199
Wang X, Ghosh S, Guo S-W (2001) Quantitative quality control in microarray image processing and data acquisition. Nucleic Acids Res 29(15):e75
Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38(16):e164
Wang G et al (2020) The characteristic of virulence, biofilm and antibiotic resistance of Klebsiella pneumoniae. Int J Environ Res Public Health 17:6278. https://doi.org/10.3390/ijerph17176278
Womble DD (1999) GCG. In: Misener S, Krawetz SA (eds) Bioinformatics methods and protocols. Humana Press, Totowa, pp 3–22
Xia Z et al (2022) A review of parallel implementations for the Smith-Waterman algorithm. Interdiscip Sci 14(1):1–14
Zhao C, Sahni S (2019) String correction using the Damerau-Levenshtein distance. BMC Bioinformatics 20(Suppl 11):277
Zhou P et al (1994) A system for gene cloning and manipulation in the yeast Candida glabrata. Gene 142(1):135–140
Zhu F et al (2020) Metagenome-wide association of gut microbiome features for schizophrenia. Nat Commun 11(1):1612
Zverinova S, Guryev V (2022) Variant calling: considerations, practices, and developments. Hum Mutat 43(8):976–985
Zweig AS et al (2008) UCSC genome browser tutorial. Genomics 92(2):75–84
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Le Bui, N., Do, VQ., Chu, DT. (2024). Bioinformatics in Gene and Genome Analysis. In: Singh, V., Kumar, A. (eds) Advances in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-99-8401-5_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-8401-5_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8400-8
Online ISBN: 978-981-99-8401-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)