1 Introduction

String matching is the performance bottleneck and key issue in many important fields. The design of exact single pattern matching algorithm owns very important significance. Especially in our focus real-time information processing and security field, high performance matching is strongly demanded.

In the string \( S = s_{0} s_{1} \ldots s_{m - 1} \), for \( 0 < k \le m \), we denote the prefix, suffix and factor of S of length k as \( pref(S,k) \)/\( suff(S,k) \)/\( fac(S,k) \). SCW is used to denote the text in the slide window. All algorithms in this paper are belonging to exact single pattern matching algorithm, which means that for given alphabet Σ (|Σ=σ|, Σ* is closure of Σ), and for given text T = \( t_{0} t_{1} \ldots t_{n - 1} \) of length n/patten \( P = p_{0} p_{1} \ldots p_{m - 1} \) of length m, P, T ∈ Σ*, seeking the window that \( P[i] = SCW[i] \) for \( \forall i \in [0 \ldots m - 1] \) in all possible sliding window. Algorithms are described in C/C++.

This paper improved three classical suffix matching algorithms: Quick Search [1], Tuned BM [2] and BMHq [3]. We added Multi-window [4] and presented an integer comparison method in them. Thus the three series of algorithm named QSMI, TBMMI and BMHqMI were presented, and they are very fast for short patterns.

2 Accelerating Method: Multi-window and Integer Comparison

Multi-window [4] (shown in Fig. 1) let the text be equally divided as k/2 areas in k window mechanism (k is even). Each area has two windows and respectively matches from both ends toward the middle region until they are overlapped and each window matching procedure by tunes. It is a general accelerate method for string matching.

Fig. 1.
figure 1

Mechanism of two windows and multiple windows

There are many compares in suffix marching. Let the delay of branch prediction failure be signed punishment. The average character compare branch cost is about \( 1 - \sigma^{ - 1} + \sigma^{ - 1} *punishment \), e.g., 10.5 ticks on DNA sequence on Prescott. If unaligned read \( pref(SCW,w) \) into an integer, compare with the integer of \( pref(P,w) \). Only when they are equal, other compares are needed. One integer comparison is equivalent to \( w \) times of character comparison and the average cost of branch is reduced to \( 1 - \sigma^{ - w} + \sigma^{ - w} *punishment \). To compare uint16_t/uint32_t on Prescott and DNA sequence, the average branch cost will obviously reduce to 3.27/1.15 ticks.

3 Improved Algorithms Based on QS, Tuned BM and BMHq

By introducing above method into QS [1], a new algorithm called QSMI_wkXc was presented, which k is the number of windows and X is the integer type for comparison: S:short/uint16_t, I:int/uint32_t, L:long long/uint64_t. The code of QSMI_w4Ic is listed as Algorithm 1.

By introducing continuous jump method of Tuned BM into QSMI, TBMMI was proposed. Firstly, determine whether the window match occurs by integer comparison in the each window. And then, bad character jumping of Quick Search continuous jump once and bad character jumping of Horspool jump several times. We use once QS jump and twice Horspool jump twice in the TBMMI. TBMMI_w4Ic that is obtained only by the bad character jump table of QS are shown as Algorithm 2.

We improved BMHq [3], by using good-prefix rule to increase the jump distance, unaligned read to reduce read operation and add the method in Sect. 3, an algorithm named BMHqMI was proposed. BMH2MI_w4Ic is shown in Algorithm 3.

When \( suff(SCW,q - 1) \notin fac(P) \), BMHq make the window slide from the win0 to the win1 show as Fig. 2. If \( suff(SCW,q - 1) \ne pref(P,q - 1) \) and win1 can not matching. So the window should keep sliding until find the first k satisfy \( suff(SCW,k) = pref(P,k) \) (the window get extra jump to the win2).

Fig. 2.
figure 2

Increase jump distance by good-prefix method.

To store the jump distance for q-grams needs q-Dimension table, which a table lookup need q times read. Unaligned read can simulate original q-Dimensional table lookup by once read and table lookup. Since on little-endian processor, *(uint16_t*)(T + i + m − 2) = \( T[i + m - 2] \) + \( b = T[i + m - 1] \) * 256. If the 2-Dimensional jump distance table is shift, build a 1-Dimensional table shift1D and for \( \forall a,b \in \varSigma \), \( shift1D[a + b*256] = shift[a][b] \). So, \( shift1D[*({\text{uint\_16}}*)(T + i + m - 2)] \) \( = shift[a][b] \). If the read string is T[i + m – 2 … i + m] for q = 3, * (uint32_t*)(T + i+m-2) &0x00ffffffu = T[i + m − 2] + (int)T[i + m − 1] * 256 + (int)T[i + m] * 65536 can be used.

4 Experiment and Results

We did the following experiment based on SMART 13.02 [6], it gave the implements of most known algorithms (in EI or SCI paper) as of Feb. 2013. The platform of this experiment is Intel Core2 E3400 @ 3.0 GHz/Ubuntu 12.10 64 bit desktop/g++4.6/-O3 optimization. The tested texts include three samples of text [8] listed as follow: DNA sequence (E.coil), pure English text (Bible.txt) and the sample of English nature language (world192.txt). This experiment compared all algorithms in SMART 13.02 and added some newer algorithms not be included in SMART, such as SBNDMqb [9], GSB2b [9], FSO [10], HGQSkip [11], kSWxC [12], SufOM [13], Greedy-QF [14], etc. If an algorithm with different parameters are called different algorithms, there were more than 1000 algorithms are compared, which covered most of known algorithms. The experiment data (dozens of thousands of records) can not be listed all. In this paper only list the highest performance of three algorithms under each match condition. The data of experiment show as Table 1 and the unit is MB/s.

Table 1. Matching speed of the fastest 3 algorithms and their optimal parameters

5 Conclusion

In this paper, three classical suffix match algorithms QS/TBM/BMHq are improved by introduce the method of Multi-window and unaligned read integer comparison, and three suffix match algorithms named QSMI/TBMMI/BMHqMI were proposed. It is shown in experiment results that these algorithms are faster than other known algorithm under multiple match conditions for matching short patterns.

6 Acknowledgements

This paper is supported by National Natural Science Foundation of Yunnan, China under Grant 2012FB131 and 2012FB137, Key Project of National Natural Science Foundation of Yunnan, China under Grant 2014FA029, and National Natural Science Foundation of China under Grant 61562051.