Dealing with large volumes of data is resource-consuming work which is more and more often delegated not only to a single computer but also to a whole distributed computing system at once. As the number of computers in a distributed system increases, the amount of effort put into effective management of the system grows. When the system reaches some critical size, much effort should be put into improving its fault tolerance. It is difficult to estimate when some particular distributed system needs such facilities for a given workload, so instead they should be implemented in a middleware which works efficiently with a distributed system of any size. It is also difficult to estimate whether a volume of data is large or not, so the middleware should also work with data of any volume. In other words, the purpose of the middleware is to provide facilities that adapt distributed computing system for a given workload. In this paper we introduce such middleware appliance. Tests show that this middleware is well-suited for typical HPC and big data workloads and its performance is comparable with well-known alternatives.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
B. Lantz, B. Heller, and N. McKeown, “A network in a laptop: rapid prototyping for software-defined networks,” in Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks (ACM, 2010), p. 19.
N. Handigol, B. Heller, V. Jeyakumar, B. Lantz, and N. McKeown, “Reproducible network experiments using container-based emulation,” in Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies (ACM, 2012), pp. 253–264.
B. Heller, “Reproducible network research with highfidelity emulation,” PhD Thesis (Stanford Univ., 2013).
A. Degtyarev and A. Reed, “Synoptic and short-term modelling of ocean waves,” Int. Shipbuild. Prog. 60, 523–553 (2013).
A. Degtyarev and I. Gankevich, “Wave surface generation using OpenCL, OpenMP and MPI,” in Proceedings of the 8th International Conference on Computer Science and Information Technologies, 2011, pp. 248–251.
A. B. Degtyarev and A. M. Reed, “Modelling of incident waves near the ship’s hull (application of autoregressive approach in problems of simulation of rough seas),” in Proceedings of the 12th International Ship Stability Workshop, 2011.
A. Degtyarev and I. Gankevich, “Evaluation of hydrodynamic pressures for autoregression model of irregular waves,” in Proceedings of the 11th International Conference on Stability of Ships and Ocean Vehicles, Athens, 2012, pp. 841–852.
Goto Kazushige and R. van de Geijn, “Anatomy of high-performance matrix multiplication,” ACM Trans. Math. Software 34 (3), 12 (2008).
Goto Kazushige and R. van de Geijn, “High-performance implementation of the level-3 blas,” ACM Trans. Math. Software 35 (1), 4 (2008).
G. E. Krasner, S. T. Pope, et al., “A description of the model-view-controller user interface paradigm in the Smalltalk-80 system,” J. Object Oriented Program. 1 (3), 26–49 (1988).
S. Vinoski, “Advanced message queuing protocol,” Internet Comput. 10 (6), 87–89 (2006).
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica, “Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (USENIX Association, 2012), p. 2.
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: cluster computing with working sets,” in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, 2010, p.10.
J. Dean and G. Sanjay, “MapReduce: simplified data processing on large clusters,” Commun. ACM 51, 107–113 (2008).
M. Hausenblas and J. Nadeau, “Apache drill: interactive ad-hoc analysis at scale,” Big Data 1.2, 100–104 (2013).
A. Thusoo et al., “Hive: a warehousing solution over a map-reduce framework,” in Proceedings of the VLDB Endowment 2.2 (2009), pp. 1626–1629.
C. Olston et al., “Pig latin: a not-so-foreign language for data processing,” in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, 2008).
V. K. Vavilapalli et al., “Apache hadoop yarn: yet another resource negotiator,” in Proceedings of the 4th Annual Symposium on Cloud Computing (ACM, 2013).
I. Gankevich, Yu. Tipikin, and V. Gaiduchok, “Subordination: cluster management without distributed consensus,” in Proceedings of the International Conference on High Performance Computing Simulation HPCS, 2015, pp. 639–642.
I. Gankevich and A. Degtyarev, “Efficient processing and classification of wave energy spectrum data with a distributed pipeline,” Comput. Res. Model. 7, 517–520 (2015).
I. Gankevich, Yu. Tipikin, A. Degtyarev, and V. Korkhov, “Novel approaches for distributing workload on commodity computer systems,” in Proceedings of the International Conference on Computational Science and Its Applications, ICCSA, Lect. Notes Comput. Sci. 9158, 259–271 (2015).
P. Hunt et al., “ZooKeeper: wait-free coordination for internet-scale systems,” in Proceedings of the USENIX Annual Technical Conference, Boston, MA, USA, June 23–25, 2010, Vol. 8.
CoreOS, Etcd, Fleet. https://coreos.com/.
NIST Big Data PWG, NIST Big Data Interoperability Framework, Vol. 1: Definitions, Reference Architecture (2015). doi 10.6028/NIST.SP.1500-1
The article is published in the original.
About this article
Cite this article
Gankevich, I., Gaiduchok, V., Korkhov, V. et al. Middleware for big data processing: test results. Phys. Part. Nuclei Lett. 14, 1001–1007 (2017). https://doi.org/10.1134/S1547477117070068