Real-Time Systems

, Volume 29, Issue 2–3, pp 227–246 | Cite as

String Matching Over Compressed Text on Handheld Devices Using Tagged Sub-Optimal Code (TSC)

Abstract

This paper presents Tagged Sub-optimal code (TSC), a new coding technique to speed up string matching over compressed databases on personal digital assistants (PDA). TSC is a variable-length sub-optimal code that supports minimal prefix property. It always determines its codeword boundary without traversing a tree or lookup table. TSC technique may be beneficial in many types of applications: speeding up string matching over compressed text, and speeding decoding process. This paper also presents two algorithms for string matching over compressed text using TSC (SCTT) and the Byte Pair Encoding (BPE) technique (SCTB). indent Several experiments were conducted to compare the performance of TSC, Byte Pair Encoding (BPE), and Huffman code. Several PDA databases with different record sizes were used: the well-known Calgary dataset and a set of small-sized PDA databases. Experimental results show that SCTT is almost twice as fast as the Huffman-based algorithm. SCTT has also the same performance in search time as the search in uncompressed databases and is faster than the SCTB algorithm. For frequently updated PDA databases such as phone books, to-do list, and memos, SCTT is the recommended method regardless of the size of the average record length, since the time required to compress the updated records using BPE poses significant delays compared to TSC.

Keywords

compress handheld searching compressed text PDA string matching pattern 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al Rassan, I. 2004. String Matching Over Compressed Text on Handheld Devices. Doctoral Thesis, George Washington University, Washington DC.Google Scholar
  2. Amir, A. and Benson, G. 1992. Efficient two-dimensional compressed matching. In Proc. Second IEEE Data Compression Conference, pp. 279–288.Google Scholar
  3. Amir, A., Benson, G., and Farach, M. 1996. Let sleeping files lie: Pattern Matching in Z-compressed files. Journal of Computer and System Science 52: 299–307.CrossRefGoogle Scholar
  4. Bellaachia, A., and Al Rassan, I. 2003. String matching over compressed text on handheld devices. In Proceeding of the International Conference on Embedded Systems and Applications ESA 03, pp. 80–86.Google Scholar
  5. Bellaachia, A., and Al Rassan, I. 2004. Fast searching over compressed text using a new coding technique: Tagged sub-optimal code (TSC). In DCC 2004: IEEE Data Compression Conference, Snowbird, Utah.Google Scholar
  6. Bey, C., Freeman, E., and Ostrem, J. 2000. Palm OS® Programmer’s Companion, Palm Inc.Google Scholar
  7. Baeza-Yates, R., and Gonnet, G. H. 1992. A new approach to text searching. Communications of the ACM 35(10): 74–82.CrossRefGoogle Scholar
  8. Boyer, R. S., and Moore, J. S. 1977. A fast string searching algorithm. Communications of the ACM 20(10): 62–72.CrossRefGoogle Scholar
  9. Farach, M., and Thorup, M. 1995. String-matching in lempel-ziv compressed strings. In 27th ACM STOC, pp. 703–713.Google Scholar
  10. Forman, G. H., and Zahorjan, J. 1994. The challenges mobile of computing. Computer Science and Engineering, University of Washington.Google Scholar
  11. Giguere, E. 1999. Palm Database Programming: The Complete Developer’s Guide. NewYork: Wiley.Google Scholar
  12. Gage, P. 1994. A new algorithm for data compression. The C Users Journal 12(2).Google Scholar
  13. Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 1999. Shift-And approach to pattern matching in LZW compressed text. In 10th Ann. Symp. on Combinatorial Pattern Matching, Spring-Verlag, pp. 1–13.Google Scholar
  14. Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., and Arikawa, S. 1998. Multiple pattern matching in LZW compressed text. In Data Compression Conference, IEEE Computer Society, pp. 103–112.Google Scholar
  15. Klein, S. T., and Shapira, D. 2001. Pattern matching in Huffman encoded text. In IEEE Computer, Data Compression Conference, pp. 449–458.Google Scholar
  16. Larsson, N. J. 1999. Structure of string matching and data compression. PhD thesis, Department of Computer Science, Lund University.Google Scholar
  17. Larsson, N. J., and Moffat, A. 1999. Offline Dictionary-Based Compression. In Proc. Data Compression Conference (DCC’99), IEEE Computer Society, pp. 296–305.Google Scholar
  18. Liddell and Moffat. 2003. Hybrid prefix codes for practical use. In Proc. IEEE Data Compression Conference, Snowbird, Utah, pp. 392–401.Google Scholar
  19. Manber, U. 1994. A text compression scheme that allow fast searching directly in the compressed file. Combinatorial Pattern Matching, Spring-Verlag, pp. 113–124.Google Scholar
  20. Maxwell, G. 1999. Teach Yourself Palm Programming in 24 Hours. Sams Publishing.Google Scholar
  21. Mitarai, S., Hirao, M., Matsumoto, T., Shinohara, A., Takeda, M., and Arikawa, S. 2001. Compressed pattern matching for sequitur, Data Compression Conference.Google Scholar
  22. Navarro, G. 2001 Regular expression searching over Ziv-Lempel compressed text. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching.Google Scholar
  23. Navarro, G., Kida, T., Takeda, M., Shinohara, A., and Arikawa, S. 2001. Faster approximate string matching over compressed text, DCC, pp. 459–467.Google Scholar
  24. Navarro, G., and Raffinot, M. 1998. A general practical approach to pattern matching over ziv-lempel compressed text. In 10th Annual Symposium on Combinatorial Pattern Matching.Google Scholar
  25. Rhode, N., and Mckeehan, J. 1998. Palm Programming: The Developer’s Guide. O’Reilly 1st edition.Google Scholar
  26. Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., and Shinohara, T. 1999. A unifying framework for compressed pattern matching, SPIRE/CRIWG.Google Scholar
  27. Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., and Shinohara, T. 2000. Speeding up pattern matching by text compression. In CIAC.Google Scholar
  28. Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., and Arikawa, S. 1999. Byte pair encoding: A text compression scheme that accelerates pattern matching. Technical Report DOI-TR-CS-161, Department of Informatics, Kyushu University.Google Scholar
  29. Shibata, Y., Matsumoto, T., Takeda, M., Shinohara, A., and Arikawa, S. 2000. A Boyer-Moore type algorithm for compressed pattern matching. S. Comb. Pattern Matching, Spring-Verlag, pp. 181–194.Google Scholar
  30. Varadarajan, S., and Chiueh, T. 1997. SASE: Implementation of a compressed text search engine. In Usenix Symposium on Internet Technologies and Systems.Google Scholar
  31. Wu, S., and Manber, U. 1992. Fast text searching Allowing Errors. Communications of the ACM 35(10): 83–91.CrossRefGoogle Scholar
  32. Ziviani, N., de Moura, E., Navarro, G., and Baeza-Yates, R. 2000. Compression: A key for next-generation text retrieval systems. IEEE Computer 33(11): 37–44.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Computer Science DepartmentGeorge Washington UniversityWashingtonU.S.A

Personalised recommendations