A Scalable Platform for Low-Latency Real-Time Analytics of Streaming Data

  • Paolo CappellariEmail author
  • Mark Roantree
  • Soon Ae Chun
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 737)


The ability to process high-volume high-speed streaming data from different data sources is critical for modern organizations to gain insights for business decisions. In this research, we present the streaming analytics platform (SDAP), which provides a set of operators to specify the process of stream data transformations and analytics. SDAP adopts a declarative approach to model and design, delivering analytics capabilities through the combination of a set of primitive operators in a simple manner. The model includes a topology to design streaming analytics specifications using a set of atomic data manipulation operators. Our evaluation demonstrates that SDAP is capable of maintaining low-latency while scaling to a cloud of distributed computing nodes, and providing easier process design and execution of streaming analytics.


Data stream processing High-performance computing Low-latency Distributed systems 



This research was supported, in part, from Collective[i] Grant RF-7M617-00-01, the National Science Foundation Grants CNS-0958379,CNS-0855217, ACI-1126113 and the City University of New York High Performance Computing Center at the College of Staten Island.


  1. 1.
    Akidau, T., Balikov, A., Bekiroglu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: Millwheel: fault-tolerant stream processing at internet scale. PVLDB 6(11), 1033–1044 (2013). Google Scholar
  2. 2.
    Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-tolerance in the borealis distributed stream processing system. ACM Trans. Database Syst. 33(1), 1–3 (2008). CrossRefGoogle Scholar
  3. 3.
    Cappellari, P., Chun, S.A., Roantree, M.: Ise: a high performance system for processing data streams. In: Proceedings of 5th International Conference on Data Science, Technology and Applications, DATA 2016, Lisbon, Portugal, pp. 13–24, 24–26 July 2016Google Scholar
  4. 4.
    Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.B.: Monitoring streams - a new class of data management applications. In: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, China, pp. 215–226, 20–23 August 2002.
  5. 5.
    Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, China, pp. 203–214, 20–23 August 2002.
  6. 6.
    Chen, X., Beschastnikh, I., Zhuang, L., Yang, F., Qian, Z., Zhou, L., Shen, G., Shen, J.: Sonora: a platform for continuous mobile-cloud computing. Technical report (2012).
  7. 7.
    Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Gerth, J., Talbot, J., Elmeleegy, K., Sears, R.: Online aggregation and continuous query support in mapreduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, pp. 1115–1118, 6–10 June 2010.
  8. 8.
    Ganglia (2015). Accessed 15 Nov 2016
  9. 9.
    Gedik, B., Yu, P.S., Bordawekar, R.: Executing stream joins on the cell processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, pp. 363–374, 23–27 September 2007.
  10. 10.
    Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Mehrotra, S., Sellis, T.K. (eds.) Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, pp. 13–24. ACM, 21–24 May 2001.
  11. 11.
    Grinev, M., Grineva, M.P., Hentschel, M., Kossmann, D.: Analytics for the realtime web. PVLDB 4(12), 1391–1394 (2011). Google Scholar
  12. 12.
    Gui, H., Roantree, M.: Topological XML data cube construction. Int. J. Web Eng. Technol. 8(4), 347–368 (2013)CrossRefGoogle Scholar
  13. 13.
    Gui, H., Roantree, M.: Using a pipeline approach to build data cube for large XML data streams. In: Hong, B., Meng, X., Chen, L., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7827, pp. 59–73. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40270-8_5 CrossRefGoogle Scholar
  14. 14.
    Infiniband (2015). Accessed 15 Nov 2016
  15. 15.
    InfoSphere streams (2015). Accessed 15 Nov 2016
  16. 16.
    Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, pp. 341–352, 5–8 March 2003. doi: 10.1109/ICDE.2003.1260804
  17. 17.
    Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, pp. 311–322, 14–16 June 2005.
  18. 18.
    Madden, S., Shah, M.A., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, pp. 49–60, 3–6 June 2002.
  19. 19.
    Maier, D., Li, J., Tucker, P., Tufte, K., Papadimos, V.: Semantics of data streams and operators. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 37–52. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-30570-5_3 CrossRefGoogle Scholar
  20. 20.
    Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G.S., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: CIDR (2003).
  21. 21.
    MVAPICH2, The Ohio State University (2015). Accessed 15 Nov 2016
  22. 22.
    Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Proceedings of the 2010 IEEE International Conference on Data Mining Workshops, ICDMW 2010, Washington, DC, USA, pp. 170–177 (2010). IEEE Computer Society. doi: 10.1109/ICDMW.2010.172
  23. 23.
    Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, Vancouver, BC, Canada, pp. 251–264, 4–6 October 2010.
  24. 24.
    Plimpton, S.J., Shead, T.M.: Streaming data analytics via message passing with application to graph algorithms. J. Parallel Distrib. Comput. 74(8), 2687–2698 (2014). doi: 10.1016/j.jpdc.2014.04.001 CrossRefGoogle Scholar
  25. 25.
    Slurm (2015). Accessed 15 Nov 2016
  26. 26.
    Teubner, J., Müller, R.: How soccer players would do stream joins. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, pp. 625–636, 12–16 June 2011.
  27. 27.
    Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., Ryaboy, D.V.: Storm@twitter. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, pp. 147–156, 22–27 June 2014.
  28. 28.
  29. 29.
    Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: 4th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2012, Boston, MA, USA, 12–13 June 2012.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Paolo Cappellari
    • 1
    Email author
  • Mark Roantree
    • 2
  • Soon Ae Chun
    • 1
  1. 1.City University of New YorkNew YorkUSA
  2. 2.School of Computing, Insight Centre for Data AnalyticsDublin City UniversityDublinIreland

Personalised recommendations