Advertisement

International Journal of Parallel Programming

, Volume 45, Issue 2, pp 382–401 | Cite as

Parallel Patterns for Window-Based Stateful Operators on Data Streams: An Algorithmic Skeleton Approach

Article

Abstract

The topic of Data Stream Processing is a recent and highly active research area dealing with the in-memory, tuple-by-tuple analysis of streaming data. Continuous queries typically consume huge volumes of data received at a great velocity. Solutions that persistently store all the input tuples and then perform off-line computation are impractical. Rather, queries must be executed continuously as data cross the streams. The goal of this paper is to present parallel patterns for window-based stateful operators, which are the most representative class of stateful data stream operators. Parallel patterns are presented “à la” Algorithmic Skeleton, by explaining the rationale of each pattern, the preconditions to safely apply it, and the outcome in terms of throughput, latency and memory consumption. The patterns have been implemented in the \(\mathtt {FastFlow}\) framework targeting off-the-shelf multicores. To the best of our knowledge this is the first time that a similar effort to merge the Data Stream Processing domain and the field of Structured Parallelism has been made.

Keywords

Parallel patterns Algorithmic skeletons Data stream processing Multi-/many-core architectures 

Notes

Acknowledgments

This work has been partially supported by the EU H2020 project RePhrase (EC-RIA, H2020, ICT-2014-1).

References

  1. 1.
    Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012)CrossRefGoogle Scholar
  2. 2.
    Apache spark streaming. https://spark.apache.org/streaming
  3. 3.
  4. 4.
  5. 5.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’02, pp. 1–16. ACM, New York, NY, USA (2002)Google Scholar
  6. 6.
    Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)CrossRefGoogle Scholar
  7. 7.
    González-Vèlez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)CrossRefGoogle Scholar
  8. 8.
    Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream processing optimizations. ACM Comput. Surv. 46(4), 46:1–46:34 (2014)CrossRefGoogle Scholar
  9. 9.
    Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Sliding window-based frequent pattern mining over data streams. Inf. Sci. 179(22), 3843–3865 (2009)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Aggarwal, C., Yu, P.: A survey of synopsis construction in data streams. In: Aggarwal, C. (ed.) Data Streams, Advances in Database Systems, vol. 31. Springer, New York (2007)Google Scholar
  13. 13.
    Patroumpas, K., Sellis, T.: Maintaining consistent results of continuous queries under diverse window specifications. Inf. Syst. 36(1), 42–61 (2011)CrossRefGoogle Scholar
  14. 14.
    Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014)CrossRefGoogle Scholar
  15. 15.
    Bertolli, C., Mencagli, G., Vanneschi, M.: Analyzing memory requirements for pervasive grid applications. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 297–301 (2010). doi: 10.1109/PDP.2010.71
  16. 16.
    Aldinucci, M., Calcagno, C., Coppo, M., Damiani, F., Drocco, M., Sciacca, E., Spinella, S., Torquati, M., Troina, A.: On designing multicore-aware simulators for systems biology endowed with online statistics. BioMed Res. Int. 2014, 207041 (2014). doi: 10.1155/2014/207041
  17. 17.
    Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec. 34(1), 39–44 (2005)CrossRefGoogle Scholar
  18. 18.
    Balkesen, C., Tatbul, N.: Scalable Data partitioning techniques for parallel sliding window processing over data streams. In: VLDB International Workshop on Data Management for Sensor Networks (DMSN’11). Seattle, WA, USA (2011)Google Scholar
  19. 19.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: An efficient unbounded lock-free queue for multi-core systems. In: Proceedings of the 18th International Conference on Parallel Processing, Euro-Par’12, pp. 662–673. Springer-Verlag, Berlin, Heidelberg (2012)Google Scholar
  20. 20.
    Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD ’02. ACM, New York, NY, USA (2002)Google Scholar
  21. 21.
    Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng. 18(3), 377–391 (2006)CrossRefGoogle Scholar
  22. 22.
    Mencagli, G., Vanneschi, M.: Towards a systematic approach to the dynamic adaptation of structured parallel computations using model predictive control. Clust. Comput. 17(4), 1443–1463 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of PisaPisaItaly

Personalised recommendations