A Two-Tire Index Structure for Approximate String Matching with Block Moves
- Cite this paper as:
- Wang B., Xie L., Wang G. (2009) A Two-Tire Index Structure for Approximate String Matching with Block Moves. In: Chen L., Liu C., Liu Q., Deng K. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5667. Springer, Berlin, Heidelberg
Many applications need to solve the problem of approximate string matching with block moves. It is an NP-Complete problem to compute block edit distance between two strings. Our goal is to filter non-candidate strings as much as possible. Based on the two matured filter strategies, frequency distance and positional q-gram, we propose a two-tire index structure to make the use of the two filters more efficiently. We give a full specification of the index structure, including how to choose character order to achieve a better filterability and how to balance number of strings in different clusters. We present our experiments on real data sets to evaluate our technique and show the proposed index structure can provide a good performance.
Unable to display preview. Download preview PDF.