Skip to main content

Real-Time Comparable Phrases Searching Via the Levenshtein Distance with the Use of CUDA Technology

  • Conference paper
  • First Online:
  • 391 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 830))

Abstract

The paper presents a real-time method for finding strings similar to a given pattern. The method is based on the Levenshtein metric with the Wagner–Fischer algorithm being adopted. An improvement is proposed to this well-known technique, a histogram-based approach which resulted in significant reduction of calculation time without a noticeable loss of correctness. Additionally, the used Wagner–Fischer algorithm has been massively parallelized with CUDA technology. The presented method is very flexible as one can define a task-suitable vocabulary, even for abstract elements, far beyond applications relevant to alphanumeric objects. The presented approach seems to be promising for networking and security applications as it is suitable for real-time analysis of data streams.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Damerau worked at IBM on problems of detection and correction of spelling errors [5] before the Levenshtein metric was invented.

  2. 2.

    http://www.dailyscript.com/scripts/RaidersoftheLostArk.pdf

  3. 3.

    We will not distinguish between uppercase and lowercase letters.

  4. 4.

    Compute Unified Device Architecture

  5. 5.

    https://docs.nvidia.com/cuda/pdf/Pascal_Tuning_Guide.pdf

  6. 6.

    https://docs.nvidia.com/cuda/pdf/Volta_Tuning_Guide.pdf

References

  1. Abdel-Ghaffar, K.A.S., Paluncic, F., Ferreira, H.C., Clarke, W.A.: On Helberg’s generalization of the Levenshtein code for multiple deletion/insertion error correction. IEEE Trans. Inf. Theory 58(3), 1804–1808 (2012)

    Article  MathSciNet  Google Scholar 

  2. Andoni A., Onak K.: Approximating edit distance in near-linear time. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, pp. 199–204. ACM (2009)

    Google Scholar 

  3. Backurs A., Indyk P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 51–58. ACM (2015)

    Google Scholar 

  4. Chowdhury, S.D., Bhattacharya, U., Parui, S.K.: Online handwriting recognition using Levenshtein distance metric. In: 12th International Conference on Document Analysis and Recognition (ICDAR) (2013)

    Google Scholar 

  5. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  6. Dong, J., Liu, H.: Semi-real-time algorithm for fast pattern matching. IET Image Proc. 10(12), 979–985 (2016)

    Article  Google Scholar 

  7. Fujita, O.: Metrics based on average distance between sets. Jpn. J. Ind. Appl. Math. 30(1), 1–19 (2013)

    Article  MathSciNet  Google Scholar 

  8. Gaikwad, S., Bogiri, N.: Levenshtein distance algorithm for efficient and effective XML duplicate detection. In: International Conference on Computer, Communication and Control (IC4), pp. 1–5 (2015)

    Google Scholar 

  9. Harish Kumar, B.T., Vibha, L., Venugopal, K.R.: Web page access prediction using hierarchical clustering based on modified Levenshtein distance and higher order Markov model. In: IEEE Region 10 Symposium (TENSYMP), pp. 1–6 (2016)

    Google Scholar 

  10. Kim S.-H., Cho H.-G.: Position-restricted approximate string matching with metric Hamming distance. In: IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 108–114 (2017)

    Google Scholar 

  11. Konstantinidis S.: Computing the Levenshtein distance of a regular language. In: IEEE Information Theory Workshop (2005)

    Google Scholar 

  12. Levandowsky, M., Winter, D.: Distance between sets. Nature 234(5323), 34–35 (1971)

    Article  Google Scholar 

  13. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  14. Nagata, J.: Modern General Topology, 3rd edn., vol. 33. Elsevier Science Publishers BV, Amsterdam (1985)

    Google Scholar 

  15. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. (CSUR) 33(1), 31–88 (2001)

    Article  Google Scholar 

  16. Nemmour, H., Chibani, Y.: New Jaccard-distance based support vector machine kernel for handwritten digit recognition. In: 3rd International Conference on Information and Communication Technologies: From Theory to Applications, pp. 1–4 (2008)

    Google Scholar 

  17. Nyirarugira, C., Choi, H.-R., Kim, J.Y., Hayes M., Kim, T.Y.: Modified Levenshtein distance for real-time gesture recognition. In: 6th International Congress on Image and Signal Processing (CISP), pp. 974–979 (2013)

    Google Scholar 

  18. Medhat, D., Hassan, A., Salama C.: A hybrid cross-language name matching technique using novel modified Levenshtein distance. In: Tenth International Conference on Computer Engineering and Systems (ICCES), pp. 204–209 (2015)

    Google Scholar 

  19. Shao, M.-M., Qian, D.-M.: The Application of Levenshtein algorithm in the examination of the question bank similarity. In: International Conference on Robots and Intelligent System (ICRIS), pp. 422–424 (2016)

    Google Scholar 

  20. Skłodowski, P., Żorski W.: Movement tracking in terrain conditions accelerated with CUDA. In: Proceedings of the Federated Conference on Computer Science and Information Systems, pp. 709–717 (2014)

    Google Scholar 

  21. Cha, S.-H., Srihari, S.N.: On measuring the distance between histograms. Pattern Recogn. 35(6), 1355–1370 (2002)

    Article  Google Scholar 

  22. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. Assoc. Comput. Mach. 21, 168–173 (1974)

    Article  MathSciNet  Google Scholar 

  23. Putra, M.E.W., Supriana, I.: Structural offline handwriting character recognition using Levenshtein distance. In: International Conference on Electrical Engineering and Informatics (ICEEI) (2015)

    Google Scholar 

  24. Yujian, L., Bo, L.: A normalized Levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)

    Article  Google Scholar 

  25. Zhu, H., Cao, Y., Zhou, Z., Gong, M.: Parallel multi-temporal remote sensing image change detection on GPU. In: IEEE 26th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW) (2012)

    Google Scholar 

  26. Żorski, W.: The hough transform application including its hardware implementation. In: Advanced Concepts for Intelligent Vision Systems: Proceedings of the 7th International Conference, Lecture Notes in Computer Science, vol. 3708, pp. 460–467. Springer (2005)

    Google Scholar 

  27. NVIDIA, CUDA C Programming Guide, March 2018, PG-02829-001_v9.1. https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf

  28. NVIDIA, CUDA C Best Practices Guide, March 2018, DG-05603-001_v9.1. https://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Witold Żorski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Żorski, W., Drogosiewicz, B. (2019). Real-Time Comparable Phrases Searching Via the Levenshtein Distance with the Use of CUDA Technology. In: Kosiuczenko, P., Zieliński, Z. (eds) Engineering Software Systems: Research and Praxis. KKIO 2018. Advances in Intelligent Systems and Computing, vol 830. Springer, Cham. https://doi.org/10.1007/978-3-319-99617-2_9

Download citation

Publish with us

Policies and ethics