Middleware for big data processing: test results

Abstract

Dealing with large volumes of data is resource-consuming work which is more and more often delegated not only to a single computer but also to a whole distributed computing system at once. As the number of computers in a distributed system increases, the amount of effort put into effective management of the system grows. When the system reaches some critical size, much effort should be put into improving its fault tolerance. It is difficult to estimate when some particular distributed system needs such facilities for a given workload, so instead they should be implemented in a middleware which works efficiently with a distributed system of any size. It is also difficult to estimate whether a volume of data is large or not, so the middleware should also work with data of any volume. In other words, the purpose of the middleware is to provide facilities that adapt distributed computing system for a given workload. In this paper we introduce such middleware appliance. Tests show that this middleware is well-suited for typical HPC and big data workloads and its performance is comparable with well-known alternatives.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    B. Lantz, B. Heller, and N. McKeown, “A network in a laptop: rapid prototyping for software-defined networks,” in Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks (ACM, 2010), p. 19.

    Google Scholar 

  2. 2.

    N. Handigol, B. Heller, V. Jeyakumar, B. Lantz, and N. McKeown, “Reproducible network experiments using container-based emulation,” in Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies (ACM, 2012), pp. 253–264.

    Google Scholar 

  3. 3.

    B. Heller, “Reproducible network research with highfidelity emulation,” PhD Thesis (Stanford Univ., 2013).

    Google Scholar 

  4. 4.

    A. Degtyarev and A. Reed, “Synoptic and short-term modelling of ocean waves,” Int. Shipbuild. Prog. 60, 523–553 (2013).

    Google Scholar 

  5. 5.

    A. Degtyarev and I. Gankevich, “Wave surface generation using OpenCL, OpenMP and MPI,” in Proceedings of the 8th International Conference on Computer Science and Information Technologies, 2011, pp. 248–251.

    Google Scholar 

  6. 6.

    A. B. Degtyarev and A. M. Reed, “Modelling of incident waves near the ship’s hull (application of autoregressive approach in problems of simulation of rough seas),” in Proceedings of the 12th International Ship Stability Workshop, 2011.

    Google Scholar 

  7. 7.

    A. Degtyarev and I. Gankevich, “Evaluation of hydrodynamic pressures for autoregression model of irregular waves,” in Proceedings of the 11th International Conference on Stability of Ships and Ocean Vehicles, Athens, 2012, pp. 841–852.

    Google Scholar 

  8. 8.

    Goto Kazushige and R. van de Geijn, “Anatomy of high-performance matrix multiplication,” ACM Trans. Math. Software 34 (3), 12 (2008).

    MathSciNet  MATH  Google Scholar 

  9. 9.

    Goto Kazushige and R. van de Geijn, “High-performance implementation of the level-3 blas,” ACM Trans. Math. Software 35 (1), 4 (2008).

    MathSciNet  Google Scholar 

  10. 10.

    G. E. Krasner, S. T. Pope, et al., “A description of the model-view-controller user interface paradigm in the Smalltalk-80 system,” J. Object Oriented Program. 1 (3), 26–49 (1988).

    Google Scholar 

  11. 11.

    S. Vinoski, “Advanced message queuing protocol,” Internet Comput. 10 (6), 87–89 (2006).

    Article  Google Scholar 

  12. 12.

    M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (USENIX Association, 2012), p. 2.

    Google Scholar 

  13. 13.

    M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: cluster computing with working sets,” in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010, p.10.

    Google Scholar 

  14. 14.

    J. Dean and G. Sanjay, “MapReduce: simplified data processing on large clusters,” Commun. ACM 51, 107–113 (2008).

    Article  Google Scholar 

  15. 15.

    M. Hausenblas and J. Nadeau, “Apache drill: interactive ad-hoc analysis at scale,” Big Data 1.2, 100–104 (2013).

    Article  Google Scholar 

  16. 16.

    A. Thusoo et al., “Hive: a warehousing solution over a map-reduce framework,” in Proceedings of the VLDB Endowment 2.2 (2009), pp. 1626–1629.

    Article  Google Scholar 

  17. 17.

    C. Olston et al., “Pig latin: a not-so-foreign language for data processing,” in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, 2008).

    Google Scholar 

  18. 18.

    V. K. Vavilapalli et al., “Apache hadoop yarn: yet another resource negotiator,” in Proceedings of the 4th Annual Symposium on Cloud Computing (ACM, 2013).

    Google Scholar 

  19. 19.

    I. Gankevich, Yu. Tipikin, and V. Gaiduchok, “Subordination: cluster management without distributed consensus,” in Proceedings of the International Conference on High Performance Computing Simulation HPCS, 2015, pp. 639–642.

    Google Scholar 

  20. 20.

    I. Gankevich and A. Degtyarev, “Efficient processing and classification of wave energy spectrum data with a distributed pipeline,” Comput. Res. Model. 7, 517–520 (2015).

    Google Scholar 

  21. 21.

    I. Gankevich, Yu. Tipikin, A. Degtyarev, and V. Korkhov, “Novel approaches for distributing workload on commodity computer systems,” in Proceedings of the International Conference on Computational Science and Its Applications, ICCSA, Lect. Notes Comput. Sci. 9158, 259–271 (2015).

    Google Scholar 

  22. 22.

    P. Hunt et al., “ZooKeeper: wait-free coordination for internet-scale systems,” in Proceedings of the USENIX Annual Technical Conference, Boston, MA, USA, June 23–25, 2010, Vol. 8.

  23. 23.

    CoreOS, Etcd, Fleet. https://coreos.com/.

  24. 24.

    NIST Big Data PWG, NIST Big Data Interoperability Framework, Vol. 1: Definitions, Reference Architecture (2015). doi 10.6028/NIST.SP.1500-1

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to I. Gankevich.

Additional information

The article is published in the original.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gankevich, I., Gaiduchok, V., Korkhov, V. et al. Middleware for big data processing: test results. Phys. Part. Nuclei Lett. 14, 1001–1007 (2017). https://doi.org/10.1134/S1547477117070068

Download citation