Skip to main content
Log in

Parallel Patterns for Window-Based Stateful Operators on Data Streams: An Algorithmic Skeleton Approach

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The topic of Data Stream Processing is a recent and highly active research area dealing with the in-memory, tuple-by-tuple analysis of streaming data. Continuous queries typically consume huge volumes of data received at a great velocity. Solutions that persistently store all the input tuples and then perform off-line computation are impractical. Rather, queries must be executed continuously as data cross the streams. The goal of this paper is to present parallel patterns for window-based stateful operators, which are the most representative class of stateful data stream operators. Parallel patterns are presented “à la” Algorithmic Skeleton, by explaining the rationale of each pattern, the preconditions to safely apply it, and the outcome in terms of throughput, latency and memory consumption. The patterns have been implemented in the \(\mathtt {FastFlow}\) framework targeting off-the-shelf multicores. To the best of our knowledge this is the first time that a similar effort to merge the Data Stream Processing domain and the field of Structured Parallelism has been made.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Replicated in a hypothetical message-passing abstract model. On multicores, based on the used run-time support, tuples replication can be avoided by sharing data, i.e. by passing memory pointers to the input tuples.

References

  1. Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012)

    Article  Google Scholar 

  2. Apache spark streaming. https://spark.apache.org/streaming

  3. Apache storm. https://storm.apache.org

  4. Ibm infosphere streams. http://www-03.ibm.com/software/products/en/infosphere-streams

  5. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’02, pp. 1–16. ACM, New York, NY, USA (2002)

  6. Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)

    Article  Google Scholar 

  7. González-Vèlez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw. Pract. Exp. 40(12), 1135–1160 (2010)

    Article  Google Scholar 

  8. Hirzel, M., Soulé, R., Schneider, S., Gedik, B., Grimm, R.: A catalog of stream processing optimizations. ACM Comput. Surv. 46(4), 46:1–46:34 (2014)

    Article  Google Scholar 

  9. Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)

    Article  Google Scholar 

  10. Fastflow (ff). http://calvados.di.unipi.it/fastflow/ (2015)

  11. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Sliding window-based frequent pattern mining over data streams. Inf. Sci. 179(22), 3843–3865 (2009)

    Article  MathSciNet  Google Scholar 

  12. Aggarwal, C., Yu, P.: A survey of synopsis construction in data streams. In: Aggarwal, C. (ed.) Data Streams, Advances in Database Systems, vol. 31. Springer, New York (2007)

    Google Scholar 

  13. Patroumpas, K., Sellis, T.: Maintaining consistent results of continuous queries under diverse window specifications. Inf. Syst. 36(1), 42–61 (2011)

    Article  Google Scholar 

  14. Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23(4), 517–539 (2014)

    Article  Google Scholar 

  15. Bertolli, C., Mencagli, G., Vanneschi, M.: Analyzing memory requirements for pervasive grid applications. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 297–301 (2010). doi:10.1109/PDP.2010.71

  16. Aldinucci, M., Calcagno, C., Coppo, M., Damiani, F., Drocco, M., Sciacca, E., Spinella, S., Torquati, M., Troina, A.: On designing multicore-aware simulators for systems biology endowed with online statistics. BioMed Res. Int. 2014, 207041 (2014). doi:10.1155/2014/207041

  17. Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Rec. 34(1), 39–44 (2005)

    Article  Google Scholar 

  18. Balkesen, C., Tatbul, N.: Scalable Data partitioning techniques for parallel sliding window processing over data streams. In: VLDB International Workshop on Data Management for Sensor Networks (DMSN’11). Seattle, WA, USA (2011)

  19. Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: An efficient unbounded lock-free queue for multi-core systems. In: Proceedings of the 18th International Conference on Parallel Processing, Euro-Par’12, pp. 662–673. Springer-Verlag, Berlin, Heidelberg (2012)

  20. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD ’02. ACM, New York, NY, USA (2002)

  21. Tao, Y., Papadias, D.: Maintaining sliding window skylines on data streams. IEEE Trans. Knowl. Data Eng. 18(3), 377–391 (2006)

    Article  Google Scholar 

  22. Mencagli, G., Vanneschi, M.: Towards a systematic approach to the dynamic adaptation of structured parallel computations using model predictive control. Clust. Comput. 17(4), 1443–1463 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the EU H2020 project RePhrase (EC-RIA, H2020, ICT-2014-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriele Mencagli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Matteis, T., Mencagli, G. Parallel Patterns for Window-Based Stateful Operators on Data Streams: An Algorithmic Skeleton Approach. Int J Parallel Prog 45, 382–401 (2017). https://doi.org/10.1007/s10766-016-0413-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0413-x

Keywords

Navigation