Bioinformatics for Analysis of Poxvirus Genomes

  • Shin-Lin Tu
  • Chris UptonEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 2023)


In recent years, there have been numerous technological advances in the field of molecular biology; these include next- and third-generation sequencing of DNA genomes and mRNA transcripts and mass spectrometry of proteins. Perhaps, however, it is genome sequencing that impacts a virologist the most. In 2017, more than 480 complete genome sequences of poxviruses have been generated, and are constantly used in many different ways by almost all molecular virologists. Matching this growth in data acquisition is an explosion of the relatively new field of bioinformatics, providing databases to store and organize this valuable/expensive data and algorithms to analyze it. For the bench virologist, access to intuitive, easy-to-use, software is often critical for performing bioinformatics-based experiments. Three common hurdles for the researcher are (1) selection, retrieval, and reformatting genomics data from large databases; (2) use of tools to compare/analyze the genomics data; and (3) display and interpretation of complex sets of results. This chapter is directed at the bench virologist and describes the software that helps overcome these obstacles, with a focus on the comparison and analysis of poxvirus genomes. Although poxvirus genomes are stored in public databases such as GenBank, this resource can be cumbersome and tedious to use if large amounts of data must to be collected. Therefore, we also highlight our Viral Orthologous Clusters database system and integrated tools that we developed specifically for the management and analysis of complete viral genomes.


Poxvirus Vaccinia virus Smallpox Bioinformatics Genomics Dotplot Multiple sequence alignment MSA VOCs VGO BBB BLAST JDotter 



The authors wish to thank the many programmers, researchers, and students who have contributed to the Virus Bioinformatics Resource software. This work has been supported by funds from the Natural Sciences Engineering Research Council of Canada. Drs. C. Upton, R. M. L. Buller, and. E. J. Lefkowitz were the original developers of the Poxvirus Bioinformatics Resource Center.


  1. 1.
    Goebel SJ, Johnson GP, Perkus ME, Davis SW, Winslow JP, Paoletti E (1990) The complete DNA sequence of vaccinia virus. Virology 179:247–266CrossRefGoogle Scholar
  2. 2.
    Bennett M, Tu S-L, Upton C, McArtor C, Gillett A, Laird T et al (2017) Complete genomic characterisation of two novel poxviruses (WKPV and EKPV) from western and eastern grey kangaroos. Virus Res 242:106–121CrossRefGoogle Scholar
  3. 3.
    Laird MR, Langille MGI, Brinkman FSL (2015) GenomeD3Plot: a library for rich, interactive visualizations of genomic data in web applications. Bioinformatics 31:3348–3349CrossRefGoogle Scholar
  4. 4.
    Upton C, Slack S, Hunter AL, Ehlers A, Roper RL (2003) Poxvirus orthologous clusters: toward defining the minimum essential poxvirus genome. J Virol 77:7590–7600CrossRefGoogle Scholar
  5. 5.
    Upton C, Hogg D, Perrin D, Boone M, Harris NL (2000) Viral genome organizer: a system for analyzing complete viral genomes. Virus Res 70:55–64CrossRefGoogle Scholar
  6. 6.
    Sonnhammer E, Durbin R (1995) A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis (Reprinted from Gene Combis, vol 167, pg GC1-GC10, 1996). Gene 167:GC1–GC10CrossRefGoogle Scholar
  7. 7.
    Brodie R, Roper RL, Upton C (2004) JDotter: a Java interface to multiple dotplots generated by dotter. Bioinformatics 20:279–281CrossRefGoogle Scholar
  8. 8.
    Sievers F, Higgins DG (2014) Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Biol 1079:105–116CrossRefGoogle Scholar
  9. 9.
    Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066CrossRefGoogle Scholar
  10. 10.
    Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113CrossRefGoogle Scholar
  11. 11.
    Hillary W, Lin S-H, Upton C (2011) Base-By-Base version 2: single nucleotide-level analysis of whole viral genome alignments. Microb Inform Exp 1:2CrossRefGoogle Scholar
  12. 12.
    Tcherepanov V, Ehlers A, Upton C (2006) Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome. BMC Genomics 7:150CrossRefGoogle Scholar
  13. 13.
    Soding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33:W244–W248CrossRefGoogle Scholar
  14. 14.
    Chevreux B (2007) MIRA: an automated genome and EST assemblerGoogle Scholar
  15. 15.
    Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477CrossRefGoogle Scholar
  16. 16.
    Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760CrossRefGoogle Scholar
  17. 17.
    Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079CrossRefGoogle Scholar
  18. 18.
    Breese MR, Liu Y (2013) NGSUtils: a software suite for analyzing and manipulating next-generation sequencing datasets. Bioinformatics 29:494–496CrossRefGoogle Scholar
  19. 19.
    Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690CrossRefGoogle Scholar
  20. 20.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402CrossRefGoogle Scholar
  21. 21.
    Madden T (2013) The BLAST sequence analysis tool.Google Scholar
  22. 22.
    Satheshkumar PS, Moss B (2009) Characterization of a newly identified 35-amino-acid component of the vaccinia virus entry/fusion complex conserved in all chordopoxviruses. J Virol 83:12822–12832CrossRefGoogle Scholar
  23. 23.
    Satheshkumar PS, Moss B (2012) Sequence-divergent chordopoxvirus homologs of the O3 protein maintain functional interactions with components of the vaccinia virus entry-fusion complex. J Virol 86:1696–1705CrossRefGoogle Scholar
  24. 24.
    Da Silva M, Upton C (2005) Host-derived pathogenicity islands in poxviruses. Virol J:2, 30CrossRefGoogle Scholar
  25. 25.
    Upton C (2000) Screening predicted coding regions in poxvirus genomes. Virus Genes 20:159–164CrossRefGoogle Scholar
  26. 26.
    Da Silva M, Upton C (2005) Using purine skews to predict genes in AT-rich poxviruses. BMC Genomics 6:22CrossRefGoogle Scholar
  27. 27.
    Boratyn GM, Schaeffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12CrossRefGoogle Scholar
  28. 28.
    Papadopoulos JS, Agarwala R (2007) COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23:1073–1079CrossRefGoogle Scholar
  29. 29.
    Kelley LA, Sternberg MJE (2009) Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc 4:363–371CrossRefGoogle Scholar
  30. 30.
    Kim DE, Chivian D, Baker D (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 32:W526–W531CrossRefGoogle Scholar
  31. 31.
    Zhang Y (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40CrossRefGoogle Scholar
  32. 32.
    O’Dea MA, Tu S-L, Pang S, De Ridder T, Jackson B, Upton C (2016) Genomic characterization of a novel poxvirus from a flying fox: evidence for a new genus? J Gen Virol 97:2363–2375CrossRefGoogle Scholar
  33. 33.
    Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC et al (2004) UCSF chimera–A visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612CrossRefGoogle Scholar
  34. 34.
    Bairoch A (1993) The prosite dictionary of sites and patterns in proteins, its current status. Nucleic Acids Res 21:3097–3103CrossRefGoogle Scholar
  35. 35.
    de Castro E, Sigrist CJA, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E et al (2006) ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res 34:W362–W365CrossRefGoogle Scholar
  36. 36.
    Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY et al (2015) CDD: NCBI’s conserved domain database. Nucleic Acids Res 43:D222–D226CrossRefGoogle Scholar
  37. 37.
    Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217CrossRefGoogle Scholar
  38. 38.
    Subramanian AR, Kaufmann M, Morgenstern B (2008) DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 3:6CrossRefGoogle Scholar
  39. 39.
    Rissman AI, Mau B, Biehl BS, Darling AE, Glasner JD, Perna NT (2009) Reordering contigs of draft genomes using the Mauve Aligner. Bioinformatics 25:2071–2073CrossRefGoogle Scholar
  40. 40.
    Hoen AG, Gardner SN, Moore JH (2013) Identification of SNPs associated with variola virus virulence. BioData Min 6:3CrossRefGoogle Scholar
  41. 41.
    Smithson C, Purdy A, Verster AJ, Upton C (2014) Prediction of Steps in the Evolution of Variola Virus Host Range. PLoS One 9:e91520CrossRefGoogle Scholar
  42. 42.
    Flygare S, Simmon K, Miller C, Qiao Y, Kennedy B, Di Sera T et al (2016) Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling. Genome Biol 17:111CrossRefGoogle Scholar
  43. 43.
    Juenemann S, Prior K, Albersmeier A, Albaum S, Kalinowski J, Goesmann A et al (2014) GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers. PLoS One 9:e107014CrossRefGoogle Scholar
  44. 44.
    Smithson C, Imbery J, Upton C (2017) Re-assembly and analysis of an ancient variola virus genome. Viruses 9:E253CrossRefGoogle Scholar
  45. 45.
    Milne I, Bayer M, Stephen G, Cardle L, Marshall D (2016) Tablet: visualizing next-generation sequence assemblies and mappings. Methods Mol Biol 1374:253–268CrossRefGoogle Scholar
  46. 46.
    Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729CrossRefGoogle Scholar
  47. 47.
    Sivashankari S, Shanmughavel P (2006) Functional annotation of hypothetical proteins—a review. Bioinformation 1:335–338CrossRefGoogle Scholar
  48. 48.
    McLeod K, Upton C (2017) Virus databases. Reference Module in Biomedical Sciences. ElsevierGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Biochemistry and MicrobiologyUniversity of VictoriaVictoriaCanada

Personalised recommendations