PathRacer: Racing Profile HMM Paths on Assembly Graph

  • Alexander Shlemov
  • Anton KorobeynikovEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11488)


Recently large databases containing profile Hidden Markov Models (pHMMs) emerged. These pHMMs may represent the sequences of antibiotic resistance genes, or allelic variations amongst highly conserved housekeeping genes used for strain typing, etc. The typical application of such a database includes the alignment of contigs to pHMM hoping that the sequence of gene of interest is located within the single contig. Such a condition is often violated for metagenomes preventing the effective use of such databases.

We present PathRacer—a novel standalone tool that aligns profile HMM directly to the assembly graph (performing the codon translation on fly for amino acid pHMMs). The tool provides the set of most probable paths traversed by a HMM through the whole assembly graph, regardless whether the sequence of interested is encoded on the single contig or scattered across the set of edges, therefore significantly improving the recovery of sequences of interest even from fragmented metagenome assemblies.


Profile HMM Graph alignment Set of most probable paths 



This work was supported by the Russian Science Foundation (grant 19-14-00172). The authors would like to extend a special thanks to Sergey Nurk and Tatiana Dvorkina for all the fruitful discussions that were of great help in improving the algorithms.


  1. 1.
  2. 2.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–10 (1990)CrossRefGoogle Scholar
  3. 3.
    Chitsaz, H., Yee-Greenbaum, J.L., Tesler, G., et al.: Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat. Biotechnol. 29(10), 915–921 (2011)CrossRefGoogle Scholar
  4. 4.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  5. 5.
    Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14(9), 755–763 (1998)CrossRefGoogle Scholar
  6. 6.
    Eddy, S.R.: Accelerated profile HMM searches. PLoS Comput. Biol. 7(10), 1–16 (2011)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Eppstein, D.: Finding the \(k\) shortest paths. SIAM J. Comput. 28(2), 652–673 (1999)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., et al.: The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44(D1), D279–D285 (2016)CrossRefGoogle Scholar
  9. 9.
    Haider, B., Ahn, T.H., Bushnell, B., Chai, J., Copeland, A., Pan, C.: Omega: an overlap-graph de novo assembler for metagenomics. Bioinformatics 30(19), 2717–2722 (2014)CrossRefGoogle Scholar
  10. 10.
    Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464 (2002)CrossRefGoogle Scholar
  11. 11.
    Li, D., Huang, Y., Leung, C.M., Luo, R., Ting, H.F., Lam, T.W.: MegaGTA: a sensitive and accurate metagenomic gene-targeted assembler using iterative de Bruijn graphs. BMC Bioinform. 18(Suppl 12), 408 (2017)CrossRefGoogle Scholar
  12. 12.
    Ng, C., et al.: Characterization of metagenomes in urban aquatic compartments reveals high prevalence of clinically relevant antibiotic resistance genes in wastewaters. Front. Microbiol. 8, 2200 (2017)CrossRefGoogle Scholar
  13. 13.
    Nurk, S., Meleshko, D., Korobeynikov, A., Pevzner, P.A.: metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27(5), 824–834 (2017)CrossRefGoogle Scholar
  14. 14.
    Poirel, L., Naas, T., Le Thomas, I., Karim, A., Bingen, E., Nordmann, P.: CTX-M-type extended-spectrum \(\beta \)-lactamase that hydrolyzes ceftazidime through a single amino acid substitution in the omega loop. Antimicrob. Agents Chemother. 45(12), 3355–3361 (2001)CrossRefGoogle Scholar
  15. 15.
    Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., et al.: The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41(D1), D590–D596 (2013)CrossRefGoogle Scholar
  16. 16.
    Seemann, T.: Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14), 2068–2069 (2014)CrossRefGoogle Scholar
  17. 17.
    Shakya, M., Quince, C., Campbell, J.H., Yang, Z.K., Schadt, C.W., Podar, M.: Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. Environ. Microbiol. 15(6), 1882–1899 (2013)CrossRefGoogle Scholar
  18. 18.
    Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)CrossRefGoogle Scholar
  19. 19.
    Wang, Q., et al.: Xander: employing a novel method for efficient gene-targeted metagenomic assembly. Microbiome 3, 32 (2015)CrossRefGoogle Scholar
  20. 20.
    Wang, Q., Quensen, J.F., Fish, J.A., Kwon Lee, T., Sun, Y., et al.: Ecological patterns of nifH genes in four terrestrial climatic zones explored with targeted metagenomics using FrameBot, a new informatics tool. mBio 4(5), e00592-13 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Center for Algorithmic BiotechnologySaint Petersburg State UniversitySaint PetersburgRussia

Personalised recommendations