Skip to main content

Efficient Pattern Matching in Elastic-Degenerate Texts

  • Conference paper
  • First Online:
Language and Automata Theory and Applications (LATA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10168))

Abstract

Motivated by applications in bioinformatics, in what follows, we extend the notion of gapped strings to elastic-degenerate strings. An elastic-degenerate string can been seen as an ordered collection of solid (standard) strings interleaved by elastic-degenerate symbols; each such symbol corresponds to a set of two or more variable-length solid strings. In this article, we present an algorithm for solving the pattern matching problem with a solid pattern and an elastic-degenerate text running in \(\mathcal {O}(N+\alpha \gamma mn)\) time; where m is the length of the pattern; n and N are the length and total size of the elastic-degenerate text, respectively; \(\alpha \) and \(\gamma \) are parameters, respectively representing the maximum number of strings in any elastic-degenerate symbol of the text and the maximum number of elastic-degenerate symbols spanned by any occurrence of the pattern in the text. The space used by the proposed algorithm is \(\mathcal {O}(N)\).

This work was partially supported by the British Council funded INSPIRE Project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amir, A., Farach, M., Galil, Z., Giancarlo, R., Park, K.: Dynamic dictionary matching. J. Comput. Syst. Sci. 49(2), 208–222 (1994). http://www.sciencedirect.com/science/article/pii/S0022000005800479

    Article  MathSciNet  MATH  Google Scholar 

  2. Church, D.M., Schneider, V.A., Steinberg, K.M., Schatz, M.C., Quinlan, A.R., Chin, C.S., Kitts, P.A., Aken, B., Marth, G.T., Hoffman, M.M., Herrero, J., Mendoza, M.L.Z., Durbin, R., Flicek, P.: Extending reference assembly models. Genome Biol. 16(1), 13 (2015). http://dx.doi.org/10.1186/s13059-015-0587-3

    Article  Google Scholar 

  3. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  4. Crochemore, M., Sagot, M.F.: Motifs in Sequences: Localization and Extraction, pp. 47–97. Marcel Dekker, New York (2004)

    Google Scholar 

  5. Dilthey, A., Cox, C., Iqbal, Z., Nelson, M.R., McVean, G.: Improved genome inference in the MHC using a population reference graph. Nat. Genet. 47(6), 682–688 (2015). Technical report, http://dx.doi.org/10.1038/ng.3257

    Article  Google Scholar 

  6. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  7. Harel, H.T., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  8. Huang, L., Popic, V., Batzoglou, S.: Short read alignment with populations of genomes. Bioinformatics 29(13), i361–i370 (2013). http://bioinformatics.oxfordjournals.org/content/29/13/i361.abstract

    Article  Google Scholar 

  9. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977). http://dx.doi.org/10.1137/0206024

    Article  MathSciNet  MATH  Google Scholar 

  10. Li, Y., Bailey, J., Kulik, L., Pei, J.: Efficient matching of substrings in uncertain sequences. In: Zaki, M.J., Obradovic, Z., Tan, P., Banerjee, A., Kamath, C., Parthasarathy, S. (eds.) Proceedings of 2014 SIAM International Conference on Data Mining, 24–26 April 2014, pp. 767–775. SIAM, Philadelphia (2014). http://dx.doi.org/10.1137/1.9781611973440.88

  11. Liu, Y., Koyutürk, M., Maxwell, S., Xiang, M., Veigl, M., Cooper, R.S., Tayo, B.O., Li, L., LaFramboise, T., Wang, Z., Zhu, X., Chance, M.R.: Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing. BMC Genomics 15(1), 685 (2014). http://dx.doi.org/10.1186/1471-2164-15-685

    Article  Google Scholar 

  12. Maciuca, S., del Ojo Elias, C., McVean, G., Iqbal, Z.: A natural encoding of genetic variation in a burrows-wheeler transform to enable mapping and genome inference. In: Frith, M., Storm Pedersen, C.N. (eds.) WABI 2016. LNCS, vol. 9838, pp. 222–233. Springer, Heidelberg (2016). doi:10.1007/978-3-319-43681-4_18

    Chapter  Google Scholar 

  13. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM (JACM) 23(2), 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  14. Pissis, S.P.: MoTeX-II: structured MoTif eXtraction from large-scale datasets. BMC Bioinform. 15(1), 235 (2014). http://dx.doi.org/10.1186/1471-2105-15-235

    Article  Google Scholar 

  15. Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006). doi:10.1007/11809678_17

    Chapter  Google Scholar 

  16. Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplification and parallelization. SIAM J. Comput. 17(6), 1253–1262 (1988). http://dx.doi.org/10.1137/0217079

    Article  MathSciNet  MATH  Google Scholar 

  17. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  18. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11. Institute of Electrical Electronics Engineer (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ritu Kundu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Iliopoulos, C.S., Kundu, R., Pissis, S.P. (2017). Efficient Pattern Matching in Elastic-Degenerate Texts. In: Drewes, F., Martín-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2017. Lecture Notes in Computer Science(), vol 10168. Springer, Cham. https://doi.org/10.1007/978-3-319-53733-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53733-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53732-0

  • Online ISBN: 978-3-319-53733-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics