Abstract
Motivated by applications in bioinformatics, in what follows, we extend the notion of gapped strings to elastic-degenerate strings. An elastic-degenerate string can been seen as an ordered collection of solid (standard) strings interleaved by elastic-degenerate symbols; each such symbol corresponds to a set of two or more variable-length solid strings. In this article, we present an algorithm for solving the pattern matching problem with a solid pattern and an elastic-degenerate text running in \(\mathcal {O}(N+\alpha \gamma mn)\) time; where m is the length of the pattern; n and N are the length and total size of the elastic-degenerate text, respectively; \(\alpha \) and \(\gamma \) are parameters, respectively representing the maximum number of strings in any elastic-degenerate symbol of the text and the maximum number of elastic-degenerate symbols spanned by any occurrence of the pattern in the text. The space used by the proposed algorithm is \(\mathcal {O}(N)\).
This work was partially supported by the British Council funded INSPIRE Project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amir, A., Farach, M., Galil, Z., Giancarlo, R., Park, K.: Dynamic dictionary matching. J. Comput. Syst. Sci. 49(2), 208–222 (1994). http://www.sciencedirect.com/science/article/pii/S0022000005800479
Church, D.M., Schneider, V.A., Steinberg, K.M., Schatz, M.C., Quinlan, A.R., Chin, C.S., Kitts, P.A., Aken, B., Marth, G.T., Hoffman, M.M., Herrero, J., Mendoza, M.L.Z., Durbin, R., Flicek, P.: Extending reference assembly models. Genome Biol. 16(1), 13 (2015). http://dx.doi.org/10.1186/s13059-015-0587-3
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)
Crochemore, M., Sagot, M.F.: Motifs in Sequences: Localization and Extraction, pp. 47–97. Marcel Dekker, New York (2004)
Dilthey, A., Cox, C., Iqbal, Z., Nelson, M.R., McVean, G.: Improved genome inference in the MHC using a population reference graph. Nat. Genet. 47(6), 682–688 (2015). Technical report, http://dx.doi.org/10.1038/ng.3257
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Harel, H.T., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Huang, L., Popic, V., Batzoglou, S.: Short read alignment with populations of genomes. Bioinformatics 29(13), i361–i370 (2013). http://bioinformatics.oxfordjournals.org/content/29/13/i361.abstract
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977). http://dx.doi.org/10.1137/0206024
Li, Y., Bailey, J., Kulik, L., Pei, J.: Efficient matching of substrings in uncertain sequences. In: Zaki, M.J., Obradovic, Z., Tan, P., Banerjee, A., Kamath, C., Parthasarathy, S. (eds.) Proceedings of 2014 SIAM International Conference on Data Mining, 24–26 April 2014, pp. 767–775. SIAM, Philadelphia (2014). http://dx.doi.org/10.1137/1.9781611973440.88
Liu, Y., Koyutürk, M., Maxwell, S., Xiang, M., Veigl, M., Cooper, R.S., Tayo, B.O., Li, L., LaFramboise, T., Wang, Z., Zhu, X., Chance, M.R.: Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing. BMC Genomics 15(1), 685 (2014). http://dx.doi.org/10.1186/1471-2164-15-685
Maciuca, S., del Ojo Elias, C., McVean, G., Iqbal, Z.: A natural encoding of genetic variation in a burrows-wheeler transform to enable mapping and genome inference. In: Frith, M., Storm Pedersen, C.N. (eds.) WABI 2016. LNCS, vol. 9838, pp. 222–233. Springer, Heidelberg (2016). doi:10.1007/978-3-319-43681-4_18
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM (JACM) 23(2), 262–272 (1976)
Pissis, S.P.: MoTeX-II: structured MoTif eXtraction from large-scale datasets. BMC Bioinform. 15(1), 235 (2014). http://dx.doi.org/10.1186/1471-2105-15-235
Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006). doi:10.1007/11809678_17
Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988). http://dx.doi.org/10.1137/0217079
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11. Institute of Electrical Electronics Engineer (1973)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Iliopoulos, C.S., Kundu, R., Pissis, S.P. (2017). Efficient Pattern Matching in Elastic-Degenerate Texts. In: Drewes, F., Martín-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2017. Lecture Notes in Computer Science(), vol 10168. Springer, Cham. https://doi.org/10.1007/978-3-319-53733-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-53733-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53732-0
Online ISBN: 978-3-319-53733-7
eBook Packages: Computer ScienceComputer Science (R0)