Abstract
We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.
-
A linear space index with query time O(m + σ j loglogn + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case.
-
An index with query time O(m + j + occ) using space \(O(\sigma^{k^2} n \log^k\log n)\), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.
-
A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].
Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proc. 39th FOCS, pp. 534–543 (1998)
Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: Proc. 11th SODA, pp. 794–803 (2000)
Bille, P., Gørtz, I.L.: Substring Range Reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (2011)
Bille, P., Li Gørtz, I., Vildhøj, H.W., Wind, D.K.: String Matching with Variable Length Gaps. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 385–394. Springer, Heidelberg (2010)
Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994)
Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Disc. Algorithms 9(4), 358–364 (2011)
Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)
Chen, G., Wu, X., Zhu, X., Arslan, A., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Sys. 10(4), 399–419 (2006)
Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)
Coelho, L.P., Oliveira, A.L.: Dotted Suffix Trees A Structure for Approximate Text Indexing. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 329–336. Springer, Heidelberg (2006)
Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th STOC, pp. 91–100 (2004)
Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. In: Proc. 9th SODA, pp. 463–472 (1998)
Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34rd STOC, pp. 592–601 (2002)
Fischer, M.J., Paterson, M.S.: String-Matching and Other Products. In: Complexity of Computation, SIAM-AMS Proceedings, pp. 113–125 (1974)
Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a Sparse Table with O(1) Worst Case Access Time. J. ACM 31, 538–544 (1984)
Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. ACM SIGACT News 17(4), 52–54 (1986)
Harel, D., Tarjan, R.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)
Iliopoulos, C.S., Rahman, M.S.: Pattern matching algorithms with don’t cares. In: Proc. 33rd SOFSEM, pp. 116–126 (2007)
Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)
Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space Efficient Indexes for String Matching with Don’t Cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)
Landau, G., Vishkin, U.: Efficient string matching with k mismatches. Theoret. Comput. Sci. 43, 239–249 (1986)
Landau, G., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)
Lewenstein, M.: Indexing with Gaps. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 135–143. Springer, Heidelberg (2011)
Maas, M., Nowak, J.: Text indexing with errors. J. Disc. Algorithms 5(4), 662–681 (2007)
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)
Sahinalp, S., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996)
Tam, A., Wu, E., Lam, T.-W., Yiu, S.-M.: Succinct Text Indexing with Wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)
Tsur, D.: Fast index for approximate string matching. J. Disc. Algorithms 8(4), 339–345 (2010)
Vildhøj, H.W., Vind, S.: String Indexing for Patterns with Wildcards. Master’s thesis, Technical University of Denmark (2011), http://www.imm.dtu.dk/~hwvi/
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th SWAT, pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S. (2012). String Indexing for Patterns with Wildcards. In: Fomin, F.V., Kaski, P. (eds) Algorithm Theory – SWAT 2012. SWAT 2012. Lecture Notes in Computer Science, vol 7357. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31155-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-31155-0_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31154-3
Online ISBN: 978-3-642-31155-0
eBook Packages: Computer ScienceComputer Science (R0)