Skip to main content

PathRacer: Racing Profile HMM Paths on Assembly Graph

  • Conference paper
  • First Online:
Algorithms for Computational Biology (AlCoB 2019)

Abstract

Recently large databases containing profile Hidden Markov Models (pHMMs) emerged. These pHMMs may represent the sequences of antibiotic resistance genes, or allelic variations amongst highly conserved housekeeping genes used for strain typing, etc. The typical application of such a database includes the alignment of contigs to pHMM hoping that the sequence of gene of interest is located within the single contig. Such a condition is often violated for metagenomes preventing the effective use of such databases.

We present PathRacer—a novel standalone tool that aligns profile HMM directly to the assembly graph (performing the codon translation on fly for amino acid pHMMs). The tool provides the set of most probable paths traversed by a HMM through the whole assembly graph, regardless whether the sequence of interested is encoded on the single contig or scattered across the set of edges, therefore significantly improving the recovery of sequences of interest even from fragmented metagenome assemblies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    So far only GFA from de Bruijn graph assemblers like SPAdes and MegaHit is supported, but we will address this restriction in the next PathRacer versions.

References

  1. NCBIfam-AMR. https://ftp.ncbi.nlm.nih.gov/hmm/NCBIfam-AMR/latest/

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–10 (1990)

    Article  Google Scholar 

  3. Chitsaz, H., Yee-Greenbaum, J.L., Tesler, G., et al.: Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol. 29(10), 915–921 (2011)

    Article  Google Scholar 

  4. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  5. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14(9), 755–763 (1998)

    Article  Google Scholar 

  6. Eddy, S.R.: Accelerated profile HMM searches. PLoS Comput. Biol. 7(10), 1–16 (2011)

    Article  MathSciNet  Google Scholar 

  7. Eppstein, D.: Finding the \(k\) shortest paths. SIAM J. Comput. 28(2), 652–673 (1999)

    Article  MathSciNet  Google Scholar 

  8. Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., et al.: The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44(D1), D279–D285 (2016)

    Article  Google Scholar 

  9. Haider, B., Ahn, T.H., Bushnell, B., Chai, J., Copeland, A., Pan, C.: Omega: an overlap-graph de novo assembler for metagenomics. Bioinformatics 30(19), 2717–2722 (2014)

    Article  Google Scholar 

  10. Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464 (2002)

    Article  Google Scholar 

  11. Li, D., Huang, Y., Leung, C.M., Luo, R., Ting, H.F., Lam, T.W.: MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs. BMC Bioinform. 18(Suppl 12), 408 (2017)

    Article  Google Scholar 

  12. Ng, C., et al.: Characterization of metagenomes in urban aquatic compartments reveals high prevalence of clinically relevant antibiotic resistance genes in wastewaters. Front. Microbiol. 8, 2200 (2017)

    Article  Google Scholar 

  13. Nurk, S., Meleshko, D., Korobeynikov, A., Pevzner, P.A.: metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27(5), 824–834 (2017)

    Article  Google Scholar 

  14. Poirel, L., Naas, T., Le Thomas, I., Karim, A., Bingen, E., Nordmann, P.: CTX-M-type extended-spectrum \(\beta \)-lactamase that hydrolyzes ceftazidime through a single amino acid substitution in the omega loop. Antimicrob. Agents Chemother. 45(12), 3355–3361 (2001)

    Article  Google Scholar 

  15. Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., et al.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41(D1), D590–D596 (2013)

    Article  Google Scholar 

  16. Seemann, T.: Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14), 2068–2069 (2014)

    Article  Google Scholar 

  17. Shakya, M., Quince, C., Campbell, J.H., Yang, Z.K., Schadt, C.W., Podar, M.: Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. Environ. Microbiol. 15(6), 1882–1899 (2013)

    Article  Google Scholar 

  18. Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)

    Article  Google Scholar 

  19. Wang, Q., et al.: Xander: employing a novel method for efficient gene-targeted metagenomic assembly. Microbiome 3, 32 (2015)

    Article  Google Scholar 

  20. Wang, Q., Quensen, J.F., Fish, J.A., Kwon Lee, T., Sun, Y., et al.: Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool. mBio 4(5), e00592-13 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Russian Science Foundation (grant 19-14-00172). The authors would like to extend a special thanks to Sergey Nurk and Tatiana Dvorkina for all the fruitful discussions that were of great help in improving the algorithms.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Korobeynikov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shlemov, A., Korobeynikov, A. (2019). PathRacer: Racing Profile HMM Paths on Assembly Graph. In: Holmes, I., Martín-Vide, C., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2019. Lecture Notes in Computer Science(), vol 11488. Springer, Cham. https://doi.org/10.1007/978-3-030-18174-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18174-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18173-4

  • Online ISBN: 978-3-030-18174-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics