Approximate Search in Digital Forensics

  • Slobodan PetrovićEmail author


In digital forensics in general and in network forensics in particular, search through very large amounts of data plays a crucial role. It is used for finding evidence in digital media as well as for finding traces of attacks in computer memory and network traffic. The amount of data to be processed is not the only challenge faced by a search algorithm. Variations in data make the search task even more difficult, and the reasons for these variations are heterogeneous (transmission errors, differences in implementation of various protocols, different data formatting on various sources of information, attempts to hide the traces of criminal activities, and so on). In some cases, especially in network forensics, velocity of data is an additional factor that further complicates the task of a search algorithm. Therefore, the use of sophisticated search algorithms implemented in an efficient way and the reduction of data quantities to process are the key success factors of digital forensics investigation. In this chapter, constrained approximate bit-parallel search algorithms capable of both reducing the size of the data sets to process and efficiently processing the remaining data are explained. We analyze capabilities of these algorithms to correctly detect evidence/traces of attacks and to keep the false-positive rate at an acceptable level.


  1. 1.
    Aho, A., & Corasick, M. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18, 333–340.MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Baeza-Yates, R., & Gonnet, G. (1992). A new approach to text searching. Communications of the ACM, 35, 74–82.CrossRefGoogle Scholar
  3. 3.
    Barton, C., Iliopoulos, C., & Pissis, S. (2015). Average-case optimal approximate circular string matching. In A. Dediu, E. Formenti, C. Marín-Vide, & B. Truthe (Eds.), Language and automata theory and applications (pp. 85–96).Google Scholar
  4. 4.
    Bro. Cited April 25, 2017
  5. 5.
    Elasticsearch. Cited May 9, 2017
  6. 6.
    Faro, S., & Lecroq, T. (2012). Twenty years of bit-parallelism in string matching. In J. Holub, B. Watson, J. Ždárek (Eds.), Festschrift for Bořivoj Melichar (pp. 72–101).Google Scholar
  7. 7.
  8. 8.
    Kuri, J., & Navarro, G. (2000). Fast multipattern search algorithms for intrusion detection. In String processing and information retrieval (SPIRE 2000) (pp. 169–180).Google Scholar
  9. 9.
    Le-Dang, N., Le, D., & Le, V. (2016). A new multiple-pattern matching algorithm for the network intrusion detection system. IACSIT International Journal of Engineering and Technology, 8, 94–100.CrossRefGoogle Scholar
  10. 10.
    Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics-Doklady, 10, 707–710.MathSciNetzbMATHGoogle Scholar
  11. 11.
    Lucene, A. Cited April 25, 2017
  12. 12.
    Mihov, S., & Schulz, K. (2004). Fast approximate search in large dictionaries. Journal of Computational Linguistics, 30, 451–477.MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Navarro, G., & Raffinot, M. (2000). Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithms, 5(4), 1–36.MathSciNetzbMATHGoogle Scholar
  14. 14.
    Navarro, G., & Raffinot, M. (2002). Flexible pattern matching in strings: Practical on-line search algorithms for texts and biological sequences. New York: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  15. 15.
    Petrović, S. (2016). A SPAM filtering scenario using bit-parallel approximate search. In P. Gomila, & M. Hinarejos (Eds.), Proceedings of the XIV Spanish Conference on Cryptology and Information Security (RECSI2016) (pp. 186–190).Google Scholar
  16. 16.
    Shulz, K., & Mihov, S. (2002). Fast string correction with Levenshtein automata. International Journal on Document Analysis and Recognition (IJDAR), 5, 67–85.CrossRefzbMATHGoogle Scholar
  17. 17.
    Shrestha, A., & Petrović, S. (2015). Approximate search with constraints on indels with application in SPAM filtering. In V. Oleshchuk (Ed.) Proceedings of Norwegian Information Security Conference (NISK-2015) (pp. 22–33).Google Scholar
  18. 18.
    Shrestha, A., & Petrović, S. (2016). Constrained row-based bit-parallel search in intrusion detection. In A. Kolosha (Ed.) Proceedings of Norwegian Information Security Conference (NISK-2016) (pp. 68–79).Google Scholar
  19. 19.
    Snort. Cited April 25, 2017
  20. 20.
    Sung-il, O., Min, S., & Inbok, L. (2013). An efficient bit-parallel algorithm for IDS. In: A. Aghdam, & M. Guo (Eds.) Proceedings of RACS 2013 (pp. 43–44).Google Scholar
  21. 21.
    Suricata. Cited April 25, 2017
  22. 22.
    Tan, L., & Sherwood, T. (2006). Architectures for bit-split string scanning in intrusion detection. IEEE Micro, 26, 110–117.CrossRefGoogle Scholar
  23. 23.
    Wu, S., & Manber, U. (1992). Fast text searching allowing errors. Communications of the ACM, 35, 83–91.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Norwegian University of Science and Technology (NTNU)TrondheimNorway

Personalised recommendations