An Efficient Cache-oblivious Parallel Viterbi Algorithm

  • Rezaul Chowdhury
  • Pramod GanapathiEmail author
  • Vivek Pradhan
  • Jesmin Jahan Tithi
  • Yunpeng Xiao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9833)


The Viterbi algorithm is used to find the most likely path through a hidden Markov model given an observed sequence, and has numerous applications. Due to its importance and high computational complexity, several algorithmic strategies have been developed to parallelize it on different parallel architectures. However, none of the existing Viterbi decoding algorithms designed for modern computers with cache hierarchies is simultaneously cache-efficient and cache-oblivious. Being oblivious of machine resources (e.g., caches and processors) while also being efficient promotes portability. In this paper, we present an efficient cache- and processor-oblivious Viterbi algorithm based on rank convergence. The algorithm builds upon the parallel Viterbi algorithm of Maleki et al. (PPoPP 2014). We provide empirical analysis of our algorithm by comparing it with Maleki et al.’s algorithm.


Viterbi algorithm Cache-efficient Cache-oblivious Recursive Divide-and-conquer Parallel Multi-instance Rank convergence 



Chowdhury and Ganapathi were supported in part by NSF grants CCF-1162196, CCF-1439084 and CNS-1553510.


  1. 1.
    Performance Application Programming Interface (PAPI).
  2. 2.
    Bille, P., Stöckel, M.: Fast and cache-oblivious dynamic programming with local dependencies. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 131–142. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268(1), 78–94 (1997)CrossRefGoogle Scholar
  4. 4.
    Cherng, C., Ladner, R.E.: Cache efficient simple dynamic programming. In: Proceedings of AofA, pp. 49–58 (2005)Google Scholar
  5. 5.
    Chin, W., Tan, S., Teo, Y.: Deriving efficient parallel programs for complex recurrences. In: Proceedings of PASCO, pp. 101–110 (1997)Google Scholar
  6. 6.
    Chin, W.N., Darlington, J., Guo, Y.: Parallelizing conditional recurrences. In: Fraigniaud, P., Mignotte, A., Bougé, L., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 579–586. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  7. 7.
    Chowdhury, R.A., Ganapathi, P., Tithi, J.J., Bachmeier, C., Kuszmaul, B.C., Leiserson, C.E., Solar-Lezama, A., Tang, Y.: AutoGen: automatic discovery of cache-oblivious parallel recursive algorithms for solving dynamic programs. In: Proceedings of PPoPP, p. 10. ACM (2016)Google Scholar
  8. 8.
    Chowdhury, R.A.: Cache-efficient algorithms and data structures: theory and experimental evaluation. Ph.D. thesis, Department of Computer Sciences, The University of Texas at Austin (2007)Google Scholar
  9. 9.
    Chowdhury, R.A., Ramachandran, V.: Cache-oblivious dynamic programming. In: Proceedings of SODA, pp. 591–600 (2006)Google Scholar
  10. 10.
    Chowdhury, R.A., Ramachandran, V.: Cache-efficient dynamic programming algorithms for multicores. In: Proceedings of SPAA, pp. 207–216 (2008)Google Scholar
  11. 11.
    Chowdhury, R.A., Ramachandran, V.: The cache-oblivious Gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation. Theory Comput. Syst. 47(4), 878–919 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Chowdhury, R.A., Ramachandran, V., Silvestri, F., Blakeley, B.: Oblivious algorithms for multicores and networks of processors. J. Parallel Distrib. Comput. 73(7), 911–925 (2013)CrossRefzbMATHGoogle Scholar
  13. 13.
    Chowdhury, R.A., Le, H.S., Ramachandran, V.: Cache-oblivious dynamic programming for bioinformatics. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(3), 495–510 (2010)CrossRefGoogle Scholar
  14. 14.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  15. 15.
    Costello, D.J., Hagenauer, J., Imai, H., Wicker, S.B.: Applications of error-control coding. IEEE Trans. Inf. Theory 44(6), 2531–2560 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Proceedings of ANLC, pp. 133–140. Association for Computational Linguistics (1992)Google Scholar
  17. 17.
    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)CrossRefzbMATHGoogle Scholar
  18. 18.
    Ferreira, M., Roma, N., Russo, L.M.: Cache-oblivious parallel SIMD Viterbi decoding for sequence search in HMMER. Bioinformatics 15(1), 165 (2014)Google Scholar
  19. 19.
    Fisher, A.L., Ghuloum, A.M.: Parallelizing complex scans and reductions. ACM SIGPLAN Notices 29(6), 135–146 (1994)CrossRefGoogle Scholar
  20. 20.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proceedings of FOCS, pp. 285–297 (1999)Google Scholar
  21. 21.
    Heller, J., Jacobs, I.: Viterbi decoding for satellite and space communication. IEEE Trans. Commun. Technol. 19(5), 835–848 (1971)CrossRefGoogle Scholar
  22. 22.
    Klein, D., Manning, C.D.: A\(^*\) parsing: fast exact Viterbi parse selection. In: Proceedings of NAACL, pp. 40–47 (2003)Google Scholar
  23. 23.
    Kobayashi, H.: Application of probabilistic decoding to digital magnetic recording systems. IBM J. Res. Dev. 15(1), 64–74 (1971)CrossRefGoogle Scholar
  24. 24.
    Krogh, A., Larsson, B., Von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305(3), 567–580 (2001)CrossRefGoogle Scholar
  25. 25.
    Liu, C.: cuHMM: A CUDA implementation of hidden Markov model training and classification. The Chronicle of Higher Education (2009)Google Scholar
  26. 26.
    Maleki, S., Musuvathi, M., Mytkowicz, T.: Parallelizing dynamic programming through rank convergence. In: Proceedings of PPoPP, pp. 219–232 (2014)Google Scholar
  27. 27.
    Maleki, S., Musuvathi, M., Mytkowicz, T.: Low-rank methods for parallelizing dynamic programming algorithms. ACM Trans. Parallel Comp. 2(4), 26 (2016)Google Scholar
  28. 28.
    Nam, H., Kwak, H.: Viterbi decoder for a high definition television (1998). US Patent 5,844,945
  29. 29.
    Ohler, U., Niemann, H., Liao, G.C., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17(Suppl. 1), S199–S206 (2001)Google Scholar
  30. 30.
    Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15(8), 1034–1050 (2005)CrossRefGoogle Scholar
  31. 31.
    Tan, G., Feng, S., Sun, N.: Locality and parallelism optimization for dynamic programming algorithm in bioinformatics. In: Proceedings of SC, p. 78 (2006)Google Scholar
  32. 32.
    Tang, S., Yu, C., Sun, J., Lee, B.S., Zhang, T., Xu, Z., Wu, H.: EasyPDP: an efficient parallel dynamic programming runtime system for computational biology. IEEE Trans. Parallel Distrib. Syst. 23(5), 862–872 (2012)CrossRefGoogle Scholar
  33. 33.
    Tang, Y., Chowdhury, R.A., Luk, C.K., Leiserson, C.E.: Coding stencil computations using the Pochoir stencil-specification language. In: Proceedings of HotPar (2011)Google Scholar
  34. 34.
    Tithi, J.J., Ganapathi, P., Talati, A., Aggarwal, S., Chowdhury, R.A.: High-performance energy-efficient recursive dynamic programming with matrix-multiplication-like flexible kernels. In: Proceedings of IPDPS (2015)Google Scholar
  35. 35.
    Treibig, J., Hager, G., Wellein, G.: Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of ICPPW, pp. 207–216 (2010)Google Scholar
  36. 36.
    Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)CrossRefzbMATHGoogle Scholar
  37. 37.
    Viterbi, A.J.: Convolutional codes and their performance in communication systems. IEEE Trans. Commun. Technol. 19(5), 751–772 (1971)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Rezaul Chowdhury
    • 1
  • Pramod Ganapathi
    • 1
    Email author
  • Vivek Pradhan
    • 1
  • Jesmin Jahan Tithi
    • 1
  • Yunpeng Xiao
    • 1
  1. 1.Department of Computer ScienceStony Brook UniversityNew YorkUSA

Personalised recommendations