Pattern Matching with Flexible Wildcard Gaps
Pattern matching is a fundamental application in biomedicine and biological sequence analysis. A wildcard can match any one character in a sequence. Multiple wildcards form a gap. A flexible wildcard gap can match any characters with specific length which is specified by users. Therefore, the effective algorithm performing this kind of matching is in great need. In this paper, we design PMFG algorithm and achieve it by dividing a pattern into multiple subpatterns with different length based on gap segmentation. After computing the starting positions and ending positions of each subpattern, the effective intervals and effective starting positions can be determined one by one. The number of the elements in the last effective position set equals to the number of the matching. A comparison experiments are done based on three DNA sequences. The results show that PMFG algorithm has better performance in the same fields.
KeywordsPattern matching Wildcard gap Sequence
- 1.Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410Google Scholar
- 3.Kim S, Bhan A, Maryada BK et al. (2007) EGGS: extraction of gene clusters using genome context based sequence matching techniques. In Proceedings of IEEE ICBB, 23–28Google Scholar
- 4.Haapasalo T, SilvastP I, Sippu S et al. (2011) Online dictionary matching with variable-length gaps. In Proceedings 10th SEA, 76–87Google Scholar
- 5.Zhang M, Kao B (2011) Mining periodic patterns with gap requirement from sequences. In Proceedings of SIGMOD, 623–633Google Scholar