Advertisement

Data-Streaming and Concurrent Data-Object Co-design: Overview and Algorithmic Challenges

  • Vincenzo Gulisano
  • Yiannis Nikolakopoulos
  • Marina Papatriantafilou
  • Philippas Tsigas
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9295)

Abstract

Processing big volumes of data generated on-line, implies needs to carry out computations on-the-fly, in the streams of data. In parallel data-stream computing, the underlying data objects can provide the means for exchanging the data so that the communication and the work imbalance between the concurrent threads performing the computation are reduced, while the pipeline parallelism is enhanced. By shedding light on the concurrent data objects and their role as articulation points in data-stream processing, we place some cornerstones to analyze the problems, propose appropriate new data structures suitable for a set of functions and identify new key challenges to improve data-stream processing through co-design with fine-grain efficient synchronization combined with the data exchange.

It is interesting to point out that research in distributed computing on multiprocessor efficient and consistent data sharing through fine-grain synchronization emerged from questions in concurrent database-related research; approximately three decades since then, it is interesting to see several returns of the fruits of this expedition, helping with the new problems in the massive-data research domain, with applications in e.g. cyberphysical systems.

Keywords

Concurrent data structures Data-streaming Stream processing engines In-memory data analysis 

References

  1. 1.
    Abadi, D.J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.-H., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.B.: The design of the borealis stream processing engine. In: CIDR, pp. 277–289 (2005)Google Scholar
  2. 2.
    Abadi, D.J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12, 12–139 (2003)CrossRefGoogle Scholar
  3. 3.
    Ailamaki, A., Kantere, V., Dash, D.: Managing scientific data. Commun. ACM 53(6), 68–78 (2010)CrossRefGoogle Scholar
  4. 4.
    Akram, S., Marazakis, M., Bilas, A.: Understanding and improving the cost of scaling distributed event processing. In: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, DEBS 2012, pp. 290–301. ACM, New York (2012)Google Scholar
  5. 5.
    Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K., Motwani, R., Srivastava, U., Widom, J.: Stream: the stanford data stream management system. Book chapter (2004)Google Scholar
  6. 6.
    Attiya, H., Welch, J.: Distributed Computing: Fundamentals. Simulations and Advanced Topics, Wiley Online Library (2004)Google Scholar
  7. 7.
    Balazinska, M., Balakrishnan, H., Madden, S.R., Stonebraker, M.: Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Database Syst. 33(1), 3 (2008)CrossRefGoogle Scholar
  8. 8.
    Callau-Zori, M., Jiménez-Peris, R., Gulisano, V., Papatriantafilou, M., Fu, Z., Patiño Martínez, M.: Stone: a stream-based ddos defense framework. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, pp. 807–812. ACM (2013)Google Scholar
  9. 9.
    Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams: a new class of data management applications. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002. VLDB Endowment (2002)Google Scholar
  10. 10.
    Cederman, D., Chatterjee, B., Nguyen, N., Nikolakopoulos, Y., Papatriantafilou, M., Tsigas, P.: A study of the behavior of synchronization methods in commonly used languages and systems. In: IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS) (2013)Google Scholar
  11. 11.
    Cederman, D., Gidenstam, A., Ha, P., Sundell, H., Papatriantafilou, M., Tsigas, P.: Lock-free concurrent data structures (2013). arXiv:1302.2757
  12. 12.
    Cederman, D., Gulisano, V., Nikolakopoulos, Y., Papatriantafilou, M., Tsigas, P.: Concurrent data structures for efficient streaming aggregation. Technical report, Chalmers University of Technology (2013)Google Scholar
  13. 13.
    Cederman, D., Gulisano, V., Nikolakopoulos, Y., Papatriantafilou, M., Tsigas, P.: Brief announcement: concurrent data structures for efficient streaming aggregation. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2014, pp. 76–78 (2014)Google Scholar
  14. 14.
    Courtois, P.-J., Heymans, F., Parnas, D.L.: Concurrent control with readers and writers. Commun. ACM 14(10), 667–668 (1971)CrossRefGoogle Scholar
  15. 15.
    Ebergen, J.: Circuits without clocks: what makes them tick? In: Papatriantafilou, M., Hunel, P. (eds.) OPODIS 2003. LNCS, vol. 3144, pp. 2–2. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  16. 16.
    Gedik, B., Bordawekar, R.R., Philip, S.Y.: Cell Join: a parallel stream join operator for the cell processor. VLDB J. 18, 501–519 (2009)CrossRefGoogle Scholar
  17. 17.
    Gulisano, V.: StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Ph.D. thesis, Universidad Politécnica de Madrid (2012)Google Scholar
  18. 18.
    Gulisano, V., Almgren, M., Papatriantafilou, M.: Metis: a two-tier intrusion detection system for advanced metering infrastructures. In: Proceedings of the 5th International Conference on Future Energy Systems, e-Energy 2014, pp. 211–212. ACM (2014)Google Scholar
  19. 19.
    Gulisano, V., Almgren, M., Papatriantafilou, M.: Online and scalable data validation in advanced metering infrastructures. In: Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), 2014 IEEE PES, pp. 1–6 (2014)Google Scholar
  20. 20.
    Gulisano, V., Almgren, M., Papatriantafilou, M.: When smart cities meet big data. ERCIM News. Smart Cities, p. 40 (2014)Google Scholar
  21. 21.
    Gulisano, V., Jimenez-Peris, R., Patiño-Martinez, M., Soriente, C., Valduriez, P.: A big data platform for large scale event processing. ERCIM News 2012(89), 2 (2012)Google Scholar
  22. 22.
    Gulisano, V., Jimenez-Peris, R., Patino-Martinez, M., Soriente, C., Valduriez, P.: Streamcloud: an elastic and scalable data streaming system. IEEE Trans. Parallel Distrib. Syst. 99 (2012)Google Scholar
  23. 23.
    Gulisano, V., Jiménez-Peris, R., Patiño-Martínez, M., Valduriez, P.: Streamcloud: a large scale data streaming system. In: ICDCS 2010: International Conference on Distributed Computing Systems (2010)Google Scholar
  24. 24.
    Gulisano, V., Nikolakopoulos, Y., Papatriantafilou, M., Tsigas, P.: ScaleJoin: a deterministic, disjoint-parallel and skew-resilient stream join enabled by concurrent data structures. Technical report, Chalmers University of Technology (2014)Google Scholar
  25. 25.
    Gulisano, V., Nikolakopoulos, Y., Walulya, I., Papatriantafilou, M., Tsigas, P.: DEBS grand challenge: deterministic real-time analytics of geospatial data streams through scalegate objects. In: DEBS 2015: the 9th ACM International Conference on Distributed Event-Based Systems (2015)Google Scholar
  26. 26.
    Hardavellas, N., Ferdman, M., Falsafi, B., Ailamaki, A.: Toward dark silicon in servers. IEEE Micro. 31(EPFL-ARTICLE-168285), 6–15 (2011)Google Scholar
  27. 27.
    Herlihy, M.P., Lev, Y., Luchangco, V., Shavit, N.N.: A simple optimistic skiplist algorithm. In: Prencipe, G., Zaks, S. (eds.) SIROCCO 2007. LNCS, vol. 4474, pp. 124–138. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  28. 28.
    Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: Proceedings of the 20th Annual International Symposium on Computer Architecture, ISCA 1993, pp. 289–300. ACM, New York (1993)Google Scholar
  29. 29.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Morgan Kaufmann, Boston (2008) Google Scholar
  30. 30.
    Herlihy, M., Shavit, N.: The Art of Multiprocessor Programming. Elsevier, Revised Reprint (2012)Google Scholar
  31. 31.
    Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
  32. 32.
    Kirousis, L.M., Spirakis, P.G., Tsigas, P.: Reading many variables in one atomic operation: solutions with linear or sublinear complexity. IEEE Trans. Parallel Distrib. Syst. 5(7), 688–696 (1994)CrossRefGoogle Scholar
  33. 33.
    Lamport, L.: Concurrent reading and writing. Commun. ACM 20(11), 806–811 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Lamport, L.: On interprocess communication. Part I: basic formalism. Distrib. Comput. 1(2), 77–85 (1986)CrossRefzbMATHGoogle Scholar
  35. 35.
    Liu, Y., Zhang, K., Spear, M.: Dynamic-sized nonblocking hash tables. In: Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC 2014. ACM (2014)Google Scholar
  36. 36.
  37. 37.
    Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)zbMATHGoogle Scholar
  38. 38.
    Lynch, N.A., Tuttle, M.R.: Hierarchical correctness proofs for distributed algorithms. In: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, Vancouver, British Columbia, Canada, August 10–12, 1987, pp. 137–151 (1987)Google Scholar
  39. 39.
    Michael, M.M.: High performance dynamic lock-free hash tables and list-based sets. In: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 2002. ACM (2002)Google Scholar
  40. 40.
    Michael, M.M.: The balancing act of choosing nonblocking features. Commun. ACM 56(9), 46–53 (2013)CrossRefGoogle Scholar
  41. 41.
    Mills, D.L.: A brief history of ntp time: memoirs of an internet timekeeper. Comput. Commun. Rev. 33, 9–21 (2003)CrossRefGoogle Scholar
  42. 42.
    Misra, J.: Axioms for memory access in asynchronous hardware systems. ACM Trans. Program. Lang. Syst. 8(1), 142–153 (1986)CrossRefzbMATHGoogle Scholar
  43. 43.
    Nikolakopoulos, Y., Gidenstam, A., Papatriantafilou, M., Tsigas, P.: A consistency framework for iteration operations in concurrent data structures. In: IEEE 29th International Symposium on Parallel and Distributed Processing (IPDPS) (2015)Google Scholar
  44. 44.
    Papadimitriou, C.H.: The serializability of concurrent database updates. J. ACM 26(4), 631–653 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Papadimitriou, C.H.: The Theory of Database Concurrency Control. Computer Science Press, Rockville (1986)zbMATHGoogle Scholar
  46. 46.
    Papatriantafilou, M., Hunel, P. (eds.): OPODIS 2003. LNCS, vol. 3144. Springer, Heidelberg (2004) zbMATHGoogle Scholar
  47. 47.
    Shavit, N., Touitou, D.: Software transactional memory. In: Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing, PODC 1995, pp. 204–213. ACM, New York (1995)Google Scholar
  48. 48.
  49. 49.
    Srivastava, U., Widom, J.: Flexible time management in data stream systems. In: Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 263–274. ACM, New York (2004)Google Scholar
  50. 50.
  51. 51.
    Sundell, H., Tsigas, P.: Fast and lock-free concurrent priority queues for multi-thread systems. J. Parallel Distrib. Comput. 65, 609–627 (2005)CrossRefzbMATHGoogle Scholar
  52. 52.
    Tuzhilin, A., Spirakis, P.G.: A semantic approach to correctness of concurrent transaction executions. In: Proceedings of the Fourth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, PODS 1985, pp. 85–95. ACM, New York (1985)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Vincenzo Gulisano
    • 1
  • Yiannis Nikolakopoulos
    • 1
  • Marina Papatriantafilou
    • 1
  • Philippas Tsigas
    • 1
  1. 1.Chalmers University of TechnologyGothenburgSweden

Personalised recommendations