Regular Programming for Quantitative Properties of Data Streams

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9632)

Abstract

We propose quantitative regular expressions (QREs) as a high-level programming abstraction for specifying complex numerical queries over data streams in a modular way. Our language allows the arbitrary nesting of orthogonal sets of combinators: (a) generalized versions of choice, concatenation, and Kleene-iteration from regular expressions, (b) streaming (serial) composition, and (c) numerical operators such as min, max, sum, difference, and averaging. Instead of requiring the programmer to figure out the low-level details of what state needs to be maintained and how to update it while processing each data item, the regular constructs facilitate a global view of the entire data stream splitting it into different cases and multiple chunks. The key technical challenge in defining our language is the design of typing rules that can be enforced efficiently and which strike a balance between expressiveness and theoretical guarantees for well-typed programs. We describe how to compile each QRE into an efficient streaming algorithm. The time and space complexity is dependent on the complexity of the data structure for representing terms over the basic numerical operators. In particular, we show that when the set of numerical operations is sum, difference, minimum, maximum, and average, the compiled algorithm uses constant space and processes each symbol in the data stream in constant time outputting the cost of the stream processed so far. Finally, we prove that the expressiveness of QREs coincides with the streaming composition of regular functions, that is, MSO-definable string-to-term transformations, leading to a potentially robust foundation for understanding their expressiveness and the complexity of analysis problems.

References

  1. 1.
    Abadi, D., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)CrossRefGoogle Scholar
  2. 2.
    Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual Symposium on Theory of Computing, STOC 1996, pp. 20–29. ACM (1996)Google Scholar
  3. 3.
    Alur, R., Černý, P.: Streaming transducers for algorithmic verification of single-pass list-processing programs. In: Proceedings of the 38th Annual Symposium on Principles of Programming Languages, POPL 2011, pp. 599–610. ACM (2011)Google Scholar
  4. 4.
    Alur, R., D’Antoni, L.: Streaming tree transducers. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part II. LNCS, vol. 7392, pp. 42–53. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Alur, R., D’Antoni, L., Deshmukh, J., Raghothaman, M., Yuan, Y.: Regular functions and cost register automata. In: 28th Annual Symposium on Logic in Computer Science, pp. 13–22 (2013)Google Scholar
  6. 6.
    Alur, R., D’Antoni, L., Raghothaman, M.: DReX: a declarative language for efficiently evaluating regular string transformations. In: Proceedings of the 42nd Annual Symposium on Principles of Programming Languages, POPL 2015, pp. 125–137. ACM (2015)Google Scholar
  7. 7.
    Alur, R., Freilich, A., Raghothaman, M.: Regular combinators for string transformations. In: The 29th Annual Symposium on Logic in Computer Science of Proceedings of the Joint Meeting of the 23rd Annual Conference on Computer Science Logic, CSL-LICS 2014, pp. 9:1–9:10. ACM (2014)Google Scholar
  8. 8.
    Arasu, A., Babu, S., Widom, J.: CQL: a language for continuous queries over streams and relations. In: Lausen, G., Suciu, D. (eds.) DBPL 2003. LNCS, vol. 2921, pp. 1–19. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Babu, S., Widom, J.: Continuous queries over data streams. SIGMOD Rec. 30(3), 109–120 (2001)CrossRefGoogle Scholar
  10. 10.
    Björklund, H., Schwentick, T.: On notions of regularity for data languages. Theoret. Comput. Sci. 411(4–5), 702–715 (2010)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Bojańczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees, XML reasoning. J. ACM 56(3), 13:1–13:48 (2009)MathSciNetMATHGoogle Scholar
  12. 12.
    Book, R., Even, S., Greibach, S., Ott, G.: Ambiguity in graphs and expressions. IEEE Trans. Comput. 20(2), 149–153 (1971)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Brüggemann-Klein, A.: Regular expressions into finite automata. In: Simon, I. (ed.) LATIN 1992. LNCS, vol. 583, pp. 87–98. Springer, Heidelberg (1992)Google Scholar
  14. 14.
    Chatterjee, K., Doyen, L., Henzinger, T.: Quantitative languages. ACM Trans. Comput. Logic 11(4), 23:1–23:38 (2010)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Chen, Y., Davidson, S., Zheng, Y.: An efficient XPath query processor for XML streams. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006. IEEE Computer Society (2006)Google Scholar
  16. 16.
    Courcelle, B.: Monadic second-order definable graph transductions: a survey. Theoret. Comput. Sci. 126(1), 53–75 (1994)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    D’Antoni, L., Veanes, M.: Equivalence of extended symbolic finite transducers. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 624–639. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  18. 18.
    D’Antoni, L., Veanes, M.: Minimization of symbolic automata. In: Proceedings of the 41st Symposium on Principles of Programming Languages, POPL 2014, pp. 541–553. ACM (2014)Google Scholar
  19. 19.
    D’Antoni, L., Veanes, M., Livshits, B., Molnar, D.: Fast: a transducer-based language for tree manipulation. In: Proceedings of the 35th Conference on Programming Language Design and Implementation, PLDI 2014, pp. 384–394. ACM (2014)Google Scholar
  20. 20.
    de Moura, L., Bjørner, N.: Satisfiability modulo theories: introduction and applications. Commun. ACM 54(9), 69–77 (2011)CrossRefGoogle Scholar
  21. 21.
    Droste, M., Kuich, W., Vogler, H.: Handbook of Weighted Automata, 1st edn. Springer, Heidelberg (2009)CrossRefMATHGoogle Scholar
  22. 22.
    Engelfriet, J., Maneth, S.: Macro tree transducers, attribute grammars, and MSO definable tree translations. Inf. Comput. 154(1), 34–91 (1999)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Engelfriet, J., Vogler, H.: Macro tree transducers. J. Comput. Syst. Sci. 31(1), 71–146 (1985)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Hooimeijer, P., Livshits, B., Molnar, D., Saxena, P., Veanes, M.: Fast and precise sanitizer analysis with BEK. In: Proceedings of the 20th USENIX Conference on Security, SEC 2011. USENIX Association (2011)Google Scholar
  25. 25.
    Kaminski, M., Francez, N.: Finite-memory automata. Theoret. Comput. Sci. 134(2), 329–363 (1994)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Koch, C.: XML stream processing. In: Liu, L., Özsu, M.T. (eds.) Encyclopedia of Database Systems, pp. 3634–3637. Springer, Heidelberg (2009)Google Scholar
  27. 27.
    Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J., Ramasamy, K., Taneja, S., Heron, T.: Stream processing at scale. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 239–250. ACM (2015)Google Scholar
  28. 28.
    Mozafari, B., Zeng, K., Zaniolo, C.: High-performance complex event processing over XML streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 253–264. ACM (2012)Google Scholar
  29. 29.
    Munro, I., Paterson, M.: Selection and sorting with limited storage. In: Proceedings of the 19th Annual Symposium on Foundations of Computer Science, SFCS 1978, pp. 253–258. IEEE Computer Society (1978)Google Scholar
  30. 30.
    Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theoret. Comput. Sci. 1(2), 117–236 (2005)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Paakki, J.: Attribute grammar paradigms–a high-level methodology in language implementation. ACM Comput. Surv. 27(2), 196–255 (1995)CrossRefGoogle Scholar
  32. 32.
    Roşu, G.: An effective algorithm for the membership problem for extended regular expressions. In: Seidl, H. (ed.) FOSSACS 2007. LNCS, vol. 4423, pp. 332–345. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  33. 33.
    Sipser, M.: Introduction to the Theory of Computation, 3rd edn. Cengage Learning, Boston (2012)MATHGoogle Scholar
  34. 34.
    Stearns, R., Hunt, H.: On the equivalence and containment problems for unambiguous regular expressions, grammars, and automata. In: Proceedings of the 22nd Annual Symposium on Foundations of Computer Science, pp. 74–81. IEEE Computer Society (1981)Google Scholar
  35. 35.
    Vaziri, M., Tardieu, O., Rabbah, R., Suter, P., Hirzel, M.: Stream processing with a spreadsheet. In: Jones, R. (ed.) ECOOP 2014. LNCS, vol. 8586, pp. 360–384. Springer, Heidelberg (2014)Google Scholar
  36. 36.
    Veanes, M., Hooimeijer, P., Livshits, B., Molnar, D., Bjorner, N.: Symbolic finite state transducers: algorithms and applications. In: Proceedings of the 39th Annual Symposium on Principles of Programming Languages, pp. 137–150. ACM (2012)Google Scholar
  37. 37.
    Zutshi, A., Sankaranarayanan, S., Deshmukh, J., Kapinski, J., Jin, X.: Falsification of safety properties for closed loop control systems. In: Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control, HSCC 2015, pp. 299–300. ACM (2015)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations