, Volume 29, Issue 2-3, pp 227-246

String Matching Over Compressed Text on Handheld Devices Using Tagged Sub-Optimal Code (TSC)

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


This paper presents Tagged Sub-optimal code (TSC), a new coding technique to speed up string matching over compressed databases on personal digital assistants (PDA). TSC is a variable-length sub-optimal code that supports minimal prefix property. It always determines its codeword boundary without traversing a tree or lookup table. TSC technique may be beneficial in many types of applications: speeding up string matching over compressed text, and speeding decoding process. This paper also presents two algorithms for string matching over compressed text using TSC (SCTT) and the Byte Pair Encoding (BPE) technique (SCTB). indent Several experiments were conducted to compare the performance of TSC, Byte Pair Encoding (BPE), and Huffman code. Several PDA databases with different record sizes were used: the well-known Calgary dataset and a set of small-sized PDA databases. Experimental results show that SCTT is almost twice as fast as the Huffman-based algorithm. SCTT has also the same performance in search time as the search in uncompressed databases and is faster than the SCTB algorithm. For frequently updated PDA databases such as phone books, to-do list, and memos, SCTT is the recommended method regardless of the size of the average record length, since the time required to compress the updated records using BPE poses significant delays compared to TSC.

Abdeghani Bellaachia is an associate professor at the Computer Science Department, George Washington University. He received his Diploma of Engineering from Mohammadia School of Engineering in Rabat, Morocco, in 1983, the MS and Doctoral of Science degrees from the George Washington University in 1992. He was the chief architect of the Arabization of the Palm-OS platform. His research interests include data mining, multi-lingual information retrieval systems, cross-language retrieval systems, database management systems, bio-informatics, design and analysis of algorithms, handheld computing, and parallel processing.
Iehab AL Rassan works for Ministry of higher education in Saudi Arabia, director of information technology department. He received his B.A. in Computer Information Systems from King Faisal University. He then received his M.S. in Computer Science from Fairleigh Dickinson University and his Doctor of Science in Computer Science from the George Washington University. His research interests include coding theories, information retrieval, string-matching algorithms, data compression, and handheld computing.