Skip to main content

Boosting Pattern Matching Performance via k-bit Filtering

  • Conference paper
  • First Online:
  • 823 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 62))

Abstract

This study explores an alternative way of storing text files in a difierent format that will speed up the searching process. The input file is decomposed into two parts as filter and payload. Filter part is composed of most informative k-bits of each byte from the original file. Remaining bits form the payload. Selection of the most informative bits are achieved according to their entropy. When an input pattern is to be searched on the new file structure, same decomposition is performed on the pattern. The filter part of the pattern is queried in the filter part of the file following by a verification process of the payload for the matching positions. Experiments conducted on natural language texts, plain ascii DNA sequences, and random byte sequences showed that the search performance with the proposed scheme is on the average two times faster than the tested exact pattern matching algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A., Galil, Z., eds.: Pattern Matching Algorithms. Oxford University Press (1997)

    Google Scholar 

  2. Charras, C., Lecroq, T.: Handbook of exact string matching algorithms. King’s Collage Publications (2004)

    Google Scholar 

  3. Crochemore, M., Rytter, W.: Jewels of stringology. World Scientific Publishing (2003)

    Google Scholar 

  4. Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing 35 (2005) 378–407

    Article  MATH  MathSciNet  Google Scholar 

  5. Wu, S., Manber, U.: Agrep - a fast approximate pattern-matching tool. In: USENIX Winter 1992 Technical Conference. (1992) 153–162

    Google Scholar 

  6. Lecroq, T.: Fast exact string matching algorithms. Information Processing Letters 102 (2007) 229–235

    Article  MATH  MathSciNet  Google Scholar 

  7. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal (1948)

    Google Scholar 

  8. Külekci, M.O.: A method to overcome computer word size limitation in bit-parallel pattern matching. In: Proceedings of ISAAC’2008. Volume 5369 of Lecture Notes in Computer Science., Gold Coast, Australia, Springer Verlag (2008) 496–506

    Google Scholar 

  9. Klein, S.T., Ben-Nissan, M.: Accelerating boyer moore searches on binary texts. In: Proceedings of CIAA. Volume 4783 of LNCS., Springer Verlag (2007) 130–143

    MathSciNet  Google Scholar 

  10. Kim, J., Kim, E., Park, K.: Fast matching method for dna sequences. In: Proceedings of Combinatorics, Algorithms, Probablistic and Experimental Methodologies. Volume 4614 of LNCS., Springer Verlag (2007) 271–281

    Google Scholar 

  11. Faro, S., Lecroq, T.: Efficient pattern matching on binary strings. In: Current Trends in Theory and Practice of Computer Science. (2009) Poster.

    Google Scholar 

  12. Faro, S., Lecroq, T.: An efficient matching algorithm for encoded dna sequences and binary strings. In: Proceedings of CPM’09. LNCS (2009)

    Google Scholar 

  13. Boyer, R., Moore, J.: A fast string searching algorithm. Communications of the ACM 20 (1977) 762–772

    Article  Google Scholar 

  14. Sunday, D.: A very fast substring search algorithm. Communications of the ACM 33 (1990) 132–142

    Article  Google Scholar 

  15. Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: A new structure for pattern matching. In: Proceedings of SOFSEM’99. Volume 1725 of LNCS., Springer Verlag (1999) 291–306

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Oğuzhan Külekci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media B.V.

About this paper

Cite this paper

Külekci, M.O., Vitter, J.S., Xu, B. (2011). Boosting Pattern Matching Performance via k-bit Filtering. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9794-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-9794-1_6

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-9793-4

  • Online ISBN: 978-90-481-9794-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics