Skip to main content

Genomic Variant Annotation: A Comprehensive Review of Tools and Techniques

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 418))

Abstract

Variant annotation is the process by which variants and mutations in the DNA are assigned functional information and is a crucial process in genomic sequence analysis. The outcomes of such annotation are beneficial because they can directly influence the conclusions arrived at in disease studies. Once genomic sequencing data is processed, and variants are called, it is vital to recognise the functional content of this data and then analyse the data to prioritise these variants. This is an interesting problem in the computational biology field and presents many open challenges that are yet to be addressed. In this paper, a comprehensive review of current work addressing the problem of variant annotation is presented. We detail the various tools and methods that have been developed for variant annotation along with datasets that have been used by these methods. Insights on open challenges and directions for future research are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mayya, V., Kamath, S.S., Sugumaran, V.: Label attention transformer architectures for ICD-10 coding of unstructured clinical notes. In: 2021 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7. IEEE (2021)

    Google Scholar 

  2. Mondal, K., Kamath, S.S.: QSAR classification models for predicting 3clpro-protease inhibitor activity. In: 2021 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–6. IEEE (2021)

    Google Scholar 

  3. Likitha, S., Kamath, S.S.: ML based QSAR models for prediction of pharmacological permeability of Caco-2 cell. In: 2021 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–6. IEEE (2021)

    Google Scholar 

  4. Clarke, L., et al.: The 1000 genomes project: data management and community access. Nat. Methods 9(5), 459–462 (2012)

    Article  Google Scholar 

  5. Sherry, S., et al.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001)

    Google Scholar 

  6. Exome Variant Server. https://evs.gs.washington.edu/EVS/. Accessed 10 May 2021

  7. Genome aggregation database gnomAd. https://gnomad.broadinstitute.org/. Accessed 10 May 2021

  8. Desmet, F.-O., Hamroun, D., Lalande, M., Collod-Béroud, G., Claustres, M., Beroud, C.: Human splicing finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009)

    Google Scholar 

  9. Wang, K., Li, M., Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)

    Google Scholar 

  10. McLaren, W., et al.: The ensembl variant effect predictor. Genome Biol. 17, 06 (2016)

    Article  Google Scholar 

  11. Jian, X., Boerwinkle, E., Liu, X.: In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 42, 13534–13544 (2014)

    Google Scholar 

  12. Liu, X., Li, C., Mou, C., Dong, Y., Tu, Y.: dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med. 12, 12 (2020)

    Article  Google Scholar 

  13. Sim, N.-L., Kumar, P., Hu, J., Henikoff, S., Schneider, G., Ng, P.: SIFT web Server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–7 (2012)

    Google Scholar 

  14. Adzhubey, I., Jordan, D., Sunyaev, S.: Predicting functional effect of human missense mutations using PolyPhen-2. In: Current Protocols in Human Genetics, Chapter 7, p. Unit7.20 (2013)

    Google Scholar 

  15. Chun, S., Fay, J.: Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009)

    Google Scholar 

  16. Schwarz, J., Rödelsperger, C., Schuelke, M., Seelow, D.: MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010)

    Google Scholar 

  17. Reva, B., Antipin, Y., Sander, C.: Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011)

    Google Scholar 

  18. Shihab, H., Gough, J., Cooper, D., Day, I., Gaunt, T.: Predicting the functional consequences of cancer-associated amino acid substitutions. Bioinformatics 29, 1504–1510 (2013)

    Google Scholar 

  19. Dong, C., et al.: Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVS in whole exome sequencing studies. Hum. Mol. Genet. 24, 12 (2014)

    Google Scholar 

  20. Kircher, M., Witten, D., Jain, P., O’Roak, B., Cooper, G., Shendure, J.: A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 02 (2014)

    Article  Google Scholar 

  21. Carter, H., Douville, C., Stenson, P., Cooper, D., Karchin, R.: Identifying mendelian disease genes with the variant effect scoring tool. BMC Genomics 14, 05 (2013)

    Article  Google Scholar 

  22. Choi, Y., Sims, G., Murphy, S., Miller, J., Chan, A.: Predicting the functional effect of amino acid substitutions and indels. PloS One 7, e46688 (2012)

    Google Scholar 

  23. Gulko, B., Hubisz, M., Gronau, I., Siepel, A.: A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 01 (2015)

    Article  Google Scholar 

  24. Shihab, H., et al.: An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31, 1536–1543 (2015)

    Google Scholar 

  25. Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2014)

    Google Scholar 

  26. Desvignes, J.-P., et al.: VarAFT: a variant annotation and filtration system for human next generation sequencing data. Nucleic Acids Res. 46, 05 (2018)

    Article  Google Scholar 

  27. Salgado, D., et al.: UMD-predictor: a high-throughput sequencing compliant system for pathogenicity prediction of any human cDNA substitution. Human Mutat. 37, 439–446 (2016)

    Google Scholar 

  28. Landrum, M., Lee, J., Benson, M., Brown, G., Chao, C., Chitipiralla, S., et al.: ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, 11 (2015)

    Google Scholar 

  29. Bamford, S., et al.: The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer 91, 355–358 (2004)

    Google Scholar 

  30. Bairoch, A., et al.: The Universal Protein Resource (UniProt). Nucleic Acids Res. 36, D154–D159 (2008)

    Google Scholar 

  31. Stenson, P., et al.: The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 10 (2020)

    Article  Google Scholar 

  32. Carithers, L., Moore, H.: The Genotype-Tissue Expression (GTEx) Project. Biopreservation Biobanking 13, 307–308 (2015)

    Google Scholar 

  33. Petryszak, R., Keays, M., Tang, A., Fonseca, N., Barrera, E., et al.: Expression atlas update - an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 44, 10 (2015)

    Google Scholar 

  34. Schaefer, C.: PID: the pathway interaction database. Nucleic Acids Res. 37, D674–D679 (2008)

    Google Scholar 

  35. Fabregat, A., et al.: Reactome pathway analysis: a high-performance in-memory approach. BMC Bioinform. 18, 03 (2017)

    Article  Google Scholar 

  36. Slenter, D., et al.: WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res. 46, 11 (2017)

    Google Scholar 

  37. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., Kanehisa, M.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27, 29–34 (1999)

    Google Scholar 

  38. Amberger, J., Bocchini, C., Schiettecatte, F., Scott, A., Hamosh, A.: OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, 11 (2014)

    Google Scholar 

  39. Rodchenkov, I., Babur, O., Luna, A., Aksoy, B., et al.: Pathway commons 2019 update: integration, analysis and exploration of pathway data. Nucleic Acids Res. 48, 10 (2019)

    Google Scholar 

  40. Köhler, S., Gargano, M., Matentzoglu, N., Carmody, L., et al.: The human phenotype ontology in 2021. Nucleic Acids Res. 49, D1207–D1217 (2020)

    Google Scholar 

  41. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J.: Gene ontology: tool for the unification of biology. Gene Ontol. Consortium. Nat Genet 25, 25–29 (2000)

    Google Scholar 

  42. Rathinakannan, V., Schukov, H.-P., Heron, S., Schleutker, J., Sipeky, C.: ShAn: an easy-to-use tool for interactive and integrated variant annotation. PLOS ONE 15, e0235669 (2020)

    Google Scholar 

  43. Yang, H., Wang, K.: Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10, 1556–1566 (2015)

    Google Scholar 

  44. Cingolani, P., et al.: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6(2), 80–92 (2012). PMID: 22728672

    Article  Google Scholar 

  45. Huang, D., Xianfu, Y., Zhou, Y., Yao, H., Xu, H., et al.: Ultrafast and scalable variant annotation and prioritization with big functional genomics data. Genome Res. 30, 1789–1801 (2020)

    Google Scholar 

  46. Medina, I., et al.: Variant: command line, web service and web interface for fast and accurate functional characterization of variants found by next-generation sequencing. Nucleic Acids Res. 40, W54–W58 (2012)

    Google Scholar 

  47. Makarov, V., O’Grady, T., Cai, G., Lihm, J., Buxbaum, J., Yoon, S.: Anntools: a comprehensive and versatile annotation toolkit for genomic variants. Bioinformatics 28, 724–725 (2012)

    Google Scholar 

  48. Ge, D., et al.: SVA: software for annotating and visualizing sequenced human genomes. Bioinformatics 27, 1998–2000 (2011)

    Google Scholar 

  49. Dharanipragada, P., Reddy, S., Parekh, N.: SeqVItA: sequence variant identification and annotation platform for next generation sequencing data. Front. Genet. 9, 11 (2018)

    Article  Google Scholar 

  50. Bao, R.: Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform. 2014, 67–82 (2014)

    Google Scholar 

  51. Mccarthy, D.: Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014)

    Google Scholar 

  52. Caspi, R., Billington, R., Fulcher, C., Keseler, I., et al.: The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, 10 (2017)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Christopher Antony Cassa, Assistant Professor, Harvard Medical School, for his continued support and ideas for this manuscript.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hebbar, P., Sowmya, S.K. (2022). Genomic Variant Annotation: A Comprehensive Review of Tools and Techniques. In: Abraham, A., Gandhi, N., Hanne, T., Hong, TP., Nogueira Rios, T., Ding, W. (eds) Intelligent Systems Design and Applications. ISDA 2021. Lecture Notes in Networks and Systems, vol 418. Springer, Cham. https://doi.org/10.1007/978-3-030-96308-8_98

Download citation

Publish with us

Policies and ethics