, Volume 99, Issue 11, pp 1105–1123 | Cite as

Accelerating Viterbi algorithm on graphics processing units



Viterbi algorithm is used in different scientific applications including biological sequence alignment, speech recognition, and probabilistic inference. However, high computational complexity of the Viterbi algorithm is a major concern. Accelerating the Viterbi algorithm is important, especially when the number of states or the length of the sequences increase significantly. In this paper, a parallel solution to improve the performance of Viterbi algorithm is presented. This is achieved by formulating a matrix product based algorithm. This algorithm has been mapped to a NVIDIA graphics processing unit. The performance for different parameters and realizations are compared. The results depicts matrix product is not a viable option for small number of states. However, matrix product solution using shared memory for large number of states gains good performance when compared with the serial version.


Hidden Markov model Viterbi algorithm Matrix product Graphics processing unit CUDA 

Mathematics Subject Classification



  1. 1.
    Ahn C, Kim J, Ju J, Choi J, Choi B, Choi S (2011) Implementation of an SDR platform using GPU and its application to a \(2\times 2\) mimo wimax system. Analog Integr Circuits Signal Process 69(2–3):107–117CrossRefGoogle Scholar
  2. 2.
    Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph 23(3):777–786CrossRefGoogle Scholar
  3. 3.
    Chan TM (2007) More algorithms for all-pairs shortest paths in weighted graphs. In: Proceedings of the thirty-ninth annual ACM symposium on theory of computing, STOC’07, pp 590–598. ACMGoogle Scholar
  4. 4.
    Coppersmith D, Winograd S (1990) Matrix multiplication via arithmetic progressions. J Symb Comput 9(3):251–280MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Du Z, Yin Z, Bader DA (2010) A tile-based parallel Viterbi algorithm for biological sequence alignment on GPU with CUDA. In: Proceedings of the 24th IEEE international symposium on parallel and distributed processing, IPDPS’10, pp 1–8. IEEEGoogle Scholar
  6. 6.
    Durbin R, Eddy SR, Krogh A, Mitchison GJ (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  7. 7.
    Eddy SR (1995) Multiple alignment using hidden Markov models. In: Proceeding of international conference on intelligent systems for molecular biology, pp 114–120Google Scholar
  8. 8.
    Fink GA (2008) Markov models for pattern recognition: from theory to applications. Springer, BerlinMATHGoogle Scholar
  9. 9.
    Forney GD (1973) The Viterbi algorithm. Proc IEEE 61:268–278MathSciNetCrossRefGoogle Scholar
  10. 10.
    Ganesan N, Chamberlain RD, Buhler J, Taufer M (2010) Accelerating HMMER on GPUs by implementing hybrid data and task parallelism. In: Proceedings of the first ACM international conference on bioinformatics and computational biology, pp 418–421Google Scholar
  11. 11.
    Hanif MK (2014) Mapping dynamic programming algorithms on graphics processing units. Ph.D. thesis, Institute of Computer Technology, Hamburg University of TechnologyGoogle Scholar
  12. 12.
    Hanif MK, Zimmermann KH (2012) Graphics card processing: accelerating profile–profile alignment. Cent Eur J Comput Sci 2:367–388Google Scholar
  13. 13.
    Horn DR, Houston M, Hanrahan P (2005) ClawHMMER: a streaming HMMer-search implementation. In: Proceedings of the 2005 ACM/IEEE conference on supercomputing, SC’05. IEEE Computer SocietyGoogle Scholar
  14. 14.
    Humayun A, Asif M, Hanif MK (2017) Btas: A library for tropical algebra. Int J Comput Sci Inf Secur 14:220–225Google Scholar
  15. 15.
    Kim J, Hyeon S, Choi S (2010) Implementation of an SDR system using graphics processing unit. IEEE Commun Mag 48(3):156–162CrossRefGoogle Scholar
  16. 16.
    Li J, Chen S, Li Y (2009) The fast evaluation of hidden Markov models on GPU. In: IEEE international conference on intelligent computing and intelligent systems, ICIS’09, vol 4, pp 426–430Google Scholar
  17. 17.
    Li R, Dou Y, Li Y, Wang S (2013) A fully parallel truncated Viterbi decoder for software defined radio on GPUS. In: 2013 IEEE wireless communications and networking conference (WCNC), pp 4305–4310. IEEEGoogle Scholar
  18. 18.
    Li R, Dou Y, Zou D (2014) Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA. Concurr Comput Pract Exp 26(3):821–840CrossRefGoogle Scholar
  19. 19.
    Lifshits Y, Mozes S, Weimann O, Ziv-Ukelson M (2009) Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica 54(3):379–399MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Lin CS, Liu WL, Yeh WT, Chang LW, Hwu WMW, Chen SJ, Hsiung PA (2011) A tiling-scheme Viterbi decoder in software defined radio for GPUs. In: 2011 7th international conference on wireless communications, networking and mobile computing (WiCOM), pp 1–4. IEEEGoogle Scholar
  21. 21.
    Liu C (2009) CuHMM: a CUDA implementation of hidden Markov model training and classification. Technical report, Johns Hopkins UniversityGoogle Scholar
  22. 22.
    MATLAB (2010) version 7.10.0 (R2010a). The MathWorks Inc., Natick, MAGoogle Scholar
  23. 23.
    Mozes S, Weimann O, Ziv-Ukelson M (2007) Speeding up HMM decoding and training by exploiting sequence repetitions. In: 18th annual symposium combinatorial pattern matching, CPM 2007, Lecture Notes in Computer Science, vol 4580, pp 4–15. SpringerGoogle Scholar
  24. 24.
    Nath R, Tomov S, Dongarra J (2010) An improved Magma Gemm for Fermi graphics processing units. Int J High Perform Comput Appl 24(4):511–515CrossRefGoogle Scholar
  25. 25.
    Nielsen J, Sand A (2011) Algorithms for a parallel implementation of hidden Markov models with a small state space. In: Proceedings of the 25th IEEE international symposium on parallel and distributed processing, IPDPS’11, pp 452–459. IEEE Computer SocietyGoogle Scholar
  26. 26.
    NVIDIA (2015) NVIDIA CUDA Compute Unified Device Architecture Programming GuideGoogle Scholar
  27. 27.
    Pachter L, Alexandersson M, Cawley S (2002) Applications of generalized pair hidden Markov models to alignment and gene finding problems. J Comput Biol 9(2):389–399CrossRefGoogle Scholar
  28. 28.
    Pachter L, Sturmfels B (2005) Algebraic statistics for computational biology. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  29. 29.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE, pp 257–286Google Scholar
  30. 30.
    Rabiner LR, Juang BH (1986) An introduction to hidden Markov models. IEEE Trans Acoust Speech Signal Process Mag 3:4–16Google Scholar
  31. 31.
    Sand A, Kristiansen M, Pedersen CNS, Mailund T (2013) zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm. BMC Bioinform 14:339CrossRefGoogle Scholar
  32. 32.
    Strassen V (1969) Gaussian elimination is not optimal. Numer Math 13:354–356MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269CrossRefMATHGoogle Scholar
  34. 34.
    Walters JP, Balu V, Kompalli S, Chaudhary V (2009) Evaluating the use of GPUs in liver image segmentation and HMMER database searches. In: Proceedings of the 23rd IEEE international symposium on parallel and distributed processing, IPDPS’09, pp 1–12. IEEE Computer SocietyGoogle Scholar
  35. 35.
    Zhang D, Zhao R, Han L, Wang T, Qu J (2009) An Implementation of Viterbi algorithm on GPU. In: Proceedings of the First IEEE international conference on information science and engineering, ICISE’09, pp 121–124Google Scholar
  36. 36.
    Zimmermann K-H (2016) Algebraic statistics. TUBdok, Hamburg University of TechnologyGoogle Scholar

Copyright information

© Springer-Verlag Wien 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceGovernment College UniversityFaisalabadPakistan
  2. 2.Institute of Embedded SystemsHamburg University of TechnologyHamburgGermany

Personalised recommendations