Factory: Master Node High-Availability for Big Data Applications and Beyond

Gankevich, Ivan; Tipikin, Yuri; Korkhov, Vladimir; Gaiduchok, Vladimir; Degtyarev, Alexander; Bogdanov, Alexander

doi:10.1007/978-3-319-42108-7_29

Ivan Gankevich²²,
Yuri Tipikin²²,
Vladimir Korkhov²²,
Vladimir Gaiduchok²²,
Alexander Degtyarev²² &
…
Alexander Bogdanov²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9787))

Included in the following conference series:

International Conference on Computational Science and Its Applications

Abstract

Master node fault-tolerance is the topic that is often dimmed in the discussion of big data processing technologies. Although failure of a master node can take down the whole data processing pipeline, this is considered either improbable or too difficult to encounter. The aim of the studies reported here is to propose rather simple technique to deal with master-node failures. This technique is based on temporary delegation of master role to one of the slave nodes and transferring updated state back to the master when one step of computation is complete. That way the state is duplicated and computation can proceed to the next step regardless of a failure of a delegate or the master (but not both). We run benchmarks to show that a failure of a master is almost “invisible” to other nodes, and failure of a delegate results in recomputation of only one step of data processing pipeline. We believe that the technique can be used not only in Big Data processing but in other types of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acun, B., Gupta, A., Jain, N., Langer, A., Menon, H., Mikida, E., Ni, X., Robson, M., Sun, Y., Totoni, E., et al.: Parallel programming with migratable objects: Charm++ in practice. In: SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 647–658. IEEE (2014)
Google Scholar
Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985)
Google Scholar
Anderson, J.C., Lehnardt, J., Slater, N.: CouchDB: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2010)
Google Scholar
Bogdanov, A., Degtyarev, A., Korkhov, V., Gaiduchok, V., Gankevich, I.: Virtual Supercomputer as Basis of Scientific Computing. Horizons in Computer Science Research, vol. 11, pp. 159–198 (2015)
Google Scholar
Boyer, E.B., Broomfield, M.C., Perrotti, T.A.: Glusterfs one storage server to rule them all. Technical report, Los Alamos National Laboratory (LANL) (2012)
Google Scholar
Cassen, A.: Keepalived: Health checking for lvs & high availability (2002). http://www.linuxvirtualserver.org
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Divya, M.S., Goyal, S.K.: Elasticsearch: an advanced and quick search technique to handle voluminous data. Compusoft 2(6), 171 (2013)
Google Scholar
Earle, M.D.: Nondirectional and directional wave data analysis procedures. Technical report, NDBC (1996)
Google Scholar
Engelmann, C., Scott, S.L., Leangsuksun, C.B., He, X.B., et al.: Symmetric active/active high availability for high-performance computing system services. J. Comput. 1(8), 43–54 (2006)
Article Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM (JACM) 32(2), 374–382 (1985)
Article MathSciNet MATH Google Scholar
Gankevich, I., Gaiduchok, V., Gushchanskiy, D., Tipikin, Y., Korkhov, V., Degtyarev, A., Bogdanov, A., Zolotarev, V.: Virtual private supercomputer: design and evaluation. In: CSIT 2013–9th International Conference on Computer Science and Information Technologies, Revised Selected Papers, pp. 1–6 (2013)
Google Scholar
Gankevich, I., Korkhov, V., Balyan, S., Gaiduchok, V., Gushchanskiy, D., Tipikin, Y., Degtyarev, A., Bogdanov, A.: Constructing virtual private supercomputer using virtualization and cloud technologies. In: Murgante, B., et al. (eds.) ICCSA 2014, Part VI. LNCS, vol. 8584, pp. 341–354. Springer, Heidelberg (2014)
Google Scholar
Gankevich, I., Degtyarev, A.: Efficient processing and classification of wave energy spectrum data with a distributed pipeline. Comput. Res. Model. 7(3), 517–520 (2015). http://crm-en.ics.org.ru/journal/article/2301/
Google Scholar
Gankevich, I., Tipikin, Y., Degtyarev, A., Korkhov, V.: Novel approaches for distributing workload on commodity computer systems. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2015. LNCS, vol. 9158, pp. 259–271. Springer, Heidelberg (2015)
Chapter Google Scholar
Gankevich, I., Tipikin, Y., Gaiduchok, V.: Subordination: cluster management without distributed consensus. In: International Conference on High Performance Computing & Simulation (HPCS), pp. 639–642. IEEE (2015)
Google Scholar
Hewitt, C., Bishop, P., Steiger, R.: A universal modular actor formalism for artificial intelligence. In: Proceedings of the 3rd International Joint Conference on Artificial Intelligence, pp. 235–245. Morgan Kaufmann Publishers Inc. (1973)
Google Scholar
Hinden, R., et al.: Virtual router redundancy protocol (vrrp); rfc3768. txt. IETF Standard, Internet Engineering Task Force, IETF, CH, pp. 0000–0003 (2004)
Google Scholar
Islam, M., Huang, A.K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., Neumann, A., Abdelnur, A.: Oozie: towards a scalable workflow management system for Hadoop. In: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, p. 4. ACM (2012)
Google Scholar
Knight, S., Weaver, D., Whipple, D., Hinden, R., Mitzel, D., Hunt, P., Higginson, P., Shand, M., Lindem, A.: Rfc2338. Virtual Router Redundancy Protocol (1998)
Google Scholar
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
Murthy, A.C., Douglas, C., Konar, M., OMalley, O., Radia, S., Agarwal, S., Vinod, K.V.: Architecture of next generation apache hadoop mapreduce framework. Apache Jira (2011)
Google Scholar
Nadas, S.: Rfc 5798: Virtual router redundancy protocol (vrrp) version 3 for ipv4 and ipv6. Internet Engineering Task Force (IETF) (2010)
Google Scholar
NDBC directional wave stations. http://www.ndbc.noaa.gov/dwa.shtml
Okorafor, E., Patrick, M.K.: Availability of jobtracker machine in hadoop/mapreduce zookeeper coordinated clusters. Adv. Comput. Int. J. (ACIJ) 3(3), 19–30 (2012)
Article Google Scholar
Ostrovsky, D., Rodenski, Y., Haji, M.: Pro Couchbase Server. Apress, Berkeley (2015)
Book Google Scholar
Uhlemann, K., Engelmann, C., Scott, S.L.: Joshua: symmetric active/active replication for highly available hpc job and resource management. In: 2006 IEEE International Conference on Cluster Computing, pp. 1–10. IEEE (2006)
Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Article Google Scholar
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)
Google Scholar

Download references

Acknowledgements

The research was carried out using computational resources of Resource Centre “Computational Centre of Saint Petersburg State University” (T-EDGE96 HPC-0011828-001) within frameworks of grants of Russian Foundation for Basic Research (projects no. 16-07-01111, 16-07-00886, 16-07-01113) and Saint Petersburg State University (project no. 0.37.155.2014).

Author information

Authors and Affiliations

Department of Computer Modelling and Multiprocessor Systems, Saint Petersburg State University, Universitetskaia emb. 7-9, 199034, Saint Petersburg, Russia
Ivan Gankevich, Yuri Tipikin, Vladimir Korkhov, Vladimir Gaiduchok, Alexander Degtyarev & Alexander Bogdanov

Authors

Ivan Gankevich
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Tipikin
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Korkhov
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Gaiduchok
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Degtyarev
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Bogdanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Gankevich .

Editor information

Editors and Affiliations

University of Perugia , Perugia, Italy
Osvaldo Gervasi
University of Basilicata , Potenza, Italy
Beniamino Murgante
Covenant University , Ota, Nigeria
Sanjay Misra
University of Minho , Braga, Portugal
Ana Maria A.C. Rocha
Polytechnic University , Bari, Italy
Carmelo M. Torre
Monash University , Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University , Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University , Saint Petersburg, Russia
Elena Stankova
Beijing University of Posts & Telecommunication , Beijing, China
Shangguang Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V., Degtyarev, A., Bogdanov, A. (2016). Factory: Master Node High-Availability for Big Data Applications and Beyond. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9787. Springer, Cham. https://doi.org/10.1007/978-3-319-42108-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-42108-7_29
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42107-0
Online ISBN: 978-3-319-42108-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics