Abstract
Variant annotation is the process by which variants and mutations in the DNA are assigned functional information and is a crucial process in genomic sequence analysis. The outcomes of such annotation are beneficial because they can directly influence the conclusions arrived at in disease studies. Once genomic sequencing data is processed, and variants are called, it is vital to recognise the functional content of this data and then analyse the data to prioritise these variants. This is an interesting problem in the computational biology field and presents many open challenges that are yet to be addressed. In this paper, a comprehensive review of current work addressing the problem of variant annotation is presented. We detail the various tools and methods that have been developed for variant annotation along with datasets that have been used by these methods. Insights on open challenges and directions for future research are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mayya, V., Kamath, S.S., Sugumaran, V.: Label attention transformer architectures for ICD-10 coding of unstructured clinical notes. In: 2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7. IEEE (2021)
Mondal, K., Kamath, S.S.: QSAR classification models for predicting 3clpro-protease inhibitor activity. In: 2021 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–6. IEEE (2021)
Likitha, S., Kamath, S.S.: ML based QSAR models for prediction of pharmacological permeability of Caco-2 cell. In: 2021 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–6. IEEE (2021)
Clarke, L., et al.: The 1000 genomes project: data management and community access. Nat. Methods 9(5), 459–462 (2012)
Sherry, S., et al.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001)
Exome Variant Server. https://evs.gs.washington.edu/EVS/. Accessed 10 May 2021
Genome aggregation database gnomAd. https://gnomad.broadinstitute.org/. Accessed 10 May 2021
Desmet, F.-O., Hamroun, D., Lalande, M., Collod-Béroud, G., Claustres, M., Beroud, C.: Human splicing finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009)
Wang, K., Li, M., Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)
McLaren, W., et al.: The ensembl variant effect predictor. Genome Biol. 17, 06 (2016)
Jian, X., Boerwinkle, E., Liu, X.: In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 42, 13534–13544 (2014)
Liu, X., Li, C., Mou, C., Dong, Y., Tu, Y.: dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med. 12, 12 (2020)
Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.: SIFT web Server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–7 (2012)
Adzhubey, I., Jordan, D., Sunyaev, S.: Predicting functional effect of human missense mutations using PolyPhen-2. In: Current Protocols in Human Genetics, Chapter 7, p. Unit7.20 (2013)
Chun, S., Fay, J.: Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009)
Schwarz, J., Rödelsperger, C., Schuelke, M., Seelow, D.: MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010)
Reva, B., Antipin, Y., Sander, C.: Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011)
Shihab, H., Gough, J., Cooper, D., Day, I., Gaunt, T.: Predicting the functional consequences of cancer-associated amino acid substitutions. Bioinformatics 29, 1504–1510 (2013)
Dong, C., et al.: Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVS in whole exome sequencing studies. Hum. Mol. Genet. 24, 12 (2014)
Kircher, M., Witten, D., Jain, P., O’Roak, B., Cooper, G., Shendure, J.: A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 02 (2014)
Carter, H., Douville, C., Stenson, P., Cooper, D., Karchin, R.: Identifying mendelian disease genes with the variant effect scoring tool. BMC Genomics 14, 05 (2013)
Choi, Y., Sims, G., Murphy, S., Miller, J., Chan, A.: Predicting the functional effect of amino acid substitutions and indels. PloS One 7, e46688 (2012)
Gulko, B., Hubisz, M., Gronau, I., Siepel, A.: A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 01 (2015)
Shihab, H., et al.: An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31, 1536–1543 (2015)
Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2014)
Desvignes, J.-P., et al.: VarAFT: a variant annotation and filtration system for human next generation sequencing data. Nucleic Acids Res. 46, 05 (2018)
Salgado, D., et al.: UMD-predictor: a high-throughput sequencing compliant system for pathogenicity prediction of any human cDNA substitution. Human Mutat. 37, 439–446 (2016)
Landrum, M., Lee, J., Benson, M., Brown, G., Chao, C., Chitipiralla, S., et al.: ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, 11 (2015)
Bamford, S., et al.: The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer 91, 355–358 (2004)
Bairoch, A., et al.: The Universal Protein Resource (UniProt). Nucleic Acids Res. 36, D154–D159 (2008)
Stenson, P., et al.: The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 10 (2020)
Carithers, L., Moore, H.: The Genotype-Tissue Expression (GTEx) Project. Biopreservation Biobanking 13, 307–308 (2015)
Petryszak, R., Keays, M., Tang, A., Fonseca, N., Barrera, E., et al.: Expression atlas update - an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 44, 10 (2015)
Schaefer, C.: PID: the pathway interaction database. Nucleic Acids Res. 37, D674–D679 (2008)
Fabregat, A., et al.: Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinform. 18, 03 (2017)
Slenter, D., et al.: WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res. 46, 11 (2017)
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27, 29–34 (1999)
Amberger, J., Bocchini, C., Schiettecatte, F., Scott, A., Hamosh, A.: OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, 11 (2014)
Rodchenkov, I., Babur, O., Luna, A., Aksoy, B., et al.: Pathway commons 2019 update: integration, analysis and exploration of pathway data. Nucleic Acids Res. 48, 10 (2019)
Köhler, S., Gargano, M., Matentzoglu, N., Carmody, L., et al.: The human phenotype ontology in 2021. Nucleic Acids Res. 49, D1207–D1217 (2020)
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J.: Gene ontology: tool for the unification of biology. Gene Ontol. Consortium. Nat Genet 25, 25–29 (2000)
Rathinakannan, V., Schukov, H.-P., Heron, S., Schleutker, J., Sipeky, C.: ShAn: an easy-to-use tool for interactive and integrated variant annotation. PLOS ONE 15, e0235669 (2020)
Yang, H., Wang, K.: Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10, 1556–1566 (2015)
Cingolani, P., et al.: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6(2), 80–92 (2012). PMID: 22728672
Huang, D., Xianfu, Y., Zhou, Y., Yao, H., Xu, H., et al.: Ultrafast and scalable variant annotation and prioritization with big functional genomics data. Genome Res. 30, 1789–1801 (2020)
Medina, I., et al.: Variant: command line, web service and web interface for fast and accurate functional characterization of variants found by next-generation sequencing. Nucleic Acids Res. 40, W54–W58 (2012)
Makarov, V., O’Grady, T., Cai, G., Lihm, J., Buxbaum, J., Yoon, S.: Anntools: a comprehensive and versatile annotation toolkit for genomic variants. Bioinformatics 28, 724–725 (2012)
Ge, D., et al.: SVA: software for annotating and visualizing sequenced human genomes. Bioinformatics 27, 1998–2000 (2011)
Dharanipragada, P., Reddy, S., Parekh, N.: SeqVItA: sequence variant identification and annotation platform for next generation sequencing data. Front. Genet. 9, 11 (2018)
Bao, R.: Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform. 2014, 67–82 (2014)
Mccarthy, D.: Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014)
Caspi, R., Billington, R., Fulcher, C., Keseler, I., et al.: The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, 10 (2017)
Acknowledgements
The authors would like to thank Dr. Christopher Antony Cassa, Assistant Professor, Harvard Medical School, for his continued support and ideas for this manuscript.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hebbar, P., Sowmya, S.K. (2022). Genomic Variant Annotation: A Comprehensive Review of Tools and Techniques. In: Abraham, A., Gandhi, N., Hanne, T., Hong, TP., Nogueira Rios, T., Ding, W. (eds) Intelligent Systems Design and Applications. ISDA 2021. Lecture Notes in Networks and Systems, vol 418. Springer, Cham. https://doi.org/10.1007/978-3-030-96308-8_98
Download citation
DOI: https://doi.org/10.1007/978-3-030-96308-8_98
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96307-1
Online ISBN: 978-3-030-96308-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)