Subsequence versus substring constraints in sequence pattern languages

  • Steven Engels
  • Tony TanEmail author
  • Jan Van den Bussche
Original Article


A family of logics for expressing patterns in sequences is investigated. The logics are all fragments of first-order logic, but they are variable-free. Instead, they can use substring and subsequence constraints as basic propositions. Propositions expressing constraints on the beginning or the end of the sequence are also available. Also wildcards can be used, which is important when the alphabet is not fixed, as is typical in database applications. The maximal logic with all four features of substring, subsequence, begin–end constraints, and wildcards, turns out to be equivalent to the family of star-free regular languages of dot-depth at most one. We investigate the lattice formed by taking all possible combinations of the above four features, and show it to be strict. For instance, we formally confirm what might intuitively be expected, namely, that boolean combinations of substring constraints are not sufficient to express subsequence constraints, and vice versa. We show an expressiveness hierarchy results from allowing multiple wildcards. We also investigate what happens with regular expressions when concatenation is replaced by subsequencing. Finally, we study the expressiveness of our logic relative to first-order logic.



We would like to thank the anonymous referees for their careful and helpful comments in improving our paper. We also thank Frank Neven for suggesting the connection to locally testable languages, and Jean-Eric Pin for his encouragement and help in proving Theorem 6.


  1. 1.
    Büchi, J.R.: Weak second-order arithmetic and finite automata. Zeitschrift für Mathematische Logic und Grundlagen der Mathematik 6, 66–92 (1960)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Brzozowski, J.A., Knast, R.: The dot-depth hierarchy of star-free languages is infinite. J. Comput. Syst. Sci. 16, 37–55 (1978)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Boston (1999)Google Scholar
  4. 4.
    Cohen, R.S., Brzozowski, J.A.: Dot-depth of star-free events. J. Comput. Syst. Sci. 5(1), 1–16 (1971)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Dong, G., Pei, J.: Sequence Data Mining. Springer, Berlin (2007)zbMATHGoogle Scholar
  6. 6.
    Faloutsos, Ch., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings ACM SIGMOD International Conference on Management of Data, pp. 419–429 (1994)CrossRefGoogle Scholar
  7. 7.
    Genkin, D., Kaminski, M., Peterfreund, L.: Closure Under Reversal of Languages over Infinite Alphabets. In: Fomin, E., Podolskii, V. (eds.),Computer Science Symposium in Russia, Proceedings (CSR), volume 10846 of Lecture Notes in Computer Science, Springer, pp. 145–156 (2018)Google Scholar
  8. 8.
    Jagadish, H.V., et al.: Making database systems usable. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 13–24 (2007)Google Scholar
  9. 9.
    Kaminski, M., Tan, T.: Regular expressions for languages over infinite alphabets. Fundam. Inf. 69, 301–318 (2006)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Loeffen, A.: Text databases: a survey of text models and systems. SIGMOD Record 23(1), 97–106 (1994)CrossRefGoogle Scholar
  11. 11.
    McNaughton, R., Papert, S.: Counter-Free Automata. MIT Press, Cambridge (1971)zbMATHGoogle Scholar
  12. 12.
    Neven, F., Schwentick, T., Vianu, V.: Finite state machines for strings over infinite alphabets. ACM Trans. Comput. Logic 5(3), 403–435 (2004)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Patel, J.M.: Special issue on querying biological sequences. IEEE Data Eng. Bull. 27(3), (2004)Google Scholar
  14. 14.
    Peterfreund, L.: Closure under reversal of languages over infinite alphabets: a case study. Master thesis, Department of Computer Science, Technion—Israel Institute of Technology (2015)Google Scholar
  15. 15.
    Pin, J.E.: Syntactic semigroups. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, chapter 10. Springer (1997)Google Scholar
  16. 16.
    Pin, J.E.: The dot-depth hierarchy, 45 years later. The role of theory in computer science, pp. 177–202 (2017)CrossRefGoogle Scholar
  17. 17.
    Place, T., van Rooijen, L., Zeitoun, M.: Separating regular languages by locally testable and locally threshold testable languages. Logical Methods Comput. Sci. 10(3) (2014)Google Scholar
  18. 18.
    Segoufin, L.: Automata and logics for words and trees over an infinite alphabet. In: Ésik, Z. (ed.) Computer Science Logic, Proceedings (CSL), volume 4207 of Lecture Notes in Computer Science, Springer, pp. 41–57 (2006)Google Scholar
  19. 19.
    Simon, I.: Piecewise testable events. In: Barkhage, H. (ed.) Automata Theory and Formal Languages, Proceedings, volume 33 of Lecture Notes in Computer Science, Springer, pp. 214–222 (1975)Google Scholar
  20. 20.
    Tan, T.: On pebble automata for data languages with decidable emptiness problem. J. Comput. Syst. Sci. 76(8), 778–791 (2010)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Tan, T.: Graph reachability and pebble automata over infinite alphabets. ACM Trans. Comput. Logic 14(3), 19 (2013)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Thomas, W.: A concatenation game and the dot-depth hierarchy. In: Computation Theory and Logic, volume 270 of Lecture Notes in Computer Science, Springer-Verlag, pp. 415–426 (1987)Google Scholar
  23. 23.
    Thomas, W.: Languages, automata, and logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 3, chapter 7. Springer (1997)Google Scholar
  24. 24.
    Wang, J.T.L., Shapiro, B.A., Shasha, D. (eds.): Pattern Discovery in Biomolecular Data. Oxford University Press, Oxford (1999)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Hasselt UniversityHasseltBelgium
  2. 2.National Taiwan UniversityTaipei CityTaiwan

Personalised recommendations