Skip to main content

Factory: Master Node High-Availability for Big Data Applications and Beyond

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2016 (ICCSA 2016)

Abstract

Master node fault-tolerance is the topic that is often dimmed in the discussion of big data processing technologies. Although failure of a master node can take down the whole data processing pipeline, this is considered either improbable or too difficult to encounter. The aim of the studies reported here is to propose rather simple technique to deal with master-node failures. This technique is based on temporary delegation of master role to one of the slave nodes and transferring updated state back to the master when one step of computation is complete. That way the state is duplicated and computation can proceed to the next step regardless of a failure of a delegate or the master (but not both). We run benchmarks to show that a failure of a master is almost “invisible” to other nodes, and failure of a delegate results in recomputation of only one step of data processing pipeline. We believe that the technique can be used not only in Big Data processing but in other types of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acun, B., Gupta, A., Jain, N., Langer, A., Menon, H., Mikida, E., Ni, X., Robson, M., Sun, Y., Totoni, E., et al.: Parallel programming with migratable objects: Charm++ in practice. In: SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 647–658. IEEE (2014)

    Google Scholar 

  2. Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985)

    Google Scholar 

  3. Anderson, J.C., Lehnardt, J., Slater, N.: CouchDB: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2010)

    Google Scholar 

  4. Bogdanov, A., Degtyarev, A., Korkhov, V., Gaiduchok, V., Gankevich, I.: Virtual Supercomputer as Basis of Scientific Computing. Horizons in Computer Science Research, vol. 11, pp. 159–198 (2015)

    Google Scholar 

  5. Boyer, E.B., Broomfield, M.C., Perrotti, T.A.: Glusterfs one storage server to rule them all. Technical report, Los Alamos National Laboratory (LANL) (2012)

    Google Scholar 

  6. Cassen, A.: Keepalived: Health checking for lvs & high availability (2002). http://www.linuxvirtualserver.org

  7. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Divya, M.S., Goyal, S.K.: Elasticsearch: an advanced and quick search technique to handle voluminous data. Compusoft 2(6), 171 (2013)

    Google Scholar 

  9. Earle, M.D.: Nondirectional and directional wave data analysis procedures. Technical report, NDBC (1996)

    Google Scholar 

  10. Engelmann, C., Scott, S.L., Leangsuksun, C.B., He, X.B., et al.: Symmetric active/active high availability for high-performance computing system services. J. Comput. 1(8), 43–54 (2006)

    Article  Google Scholar 

  11. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM (JACM) 32(2), 374–382 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gankevich, I., Gaiduchok, V., Gushchanskiy, D., Tipikin, Y., Korkhov, V., Degtyarev, A., Bogdanov, A., Zolotarev, V.: Virtual private supercomputer: design and evaluation. In: CSIT 2013–9th International Conference on Computer Science and Information Technologies, Revised Selected Papers, pp. 1–6 (2013)

    Google Scholar 

  13. Gankevich, I., Korkhov, V., Balyan, S., Gaiduchok, V., Gushchanskiy, D., Tipikin, Y., Degtyarev, A., Bogdanov, A.: Constructing virtual private supercomputer using virtualization and cloud technologies. In: Murgante, B., et al. (eds.) ICCSA 2014, Part VI. LNCS, vol. 8584, pp. 341–354. Springer, Heidelberg (2014)

    Google Scholar 

  14. Gankevich, I., Degtyarev, A.: Efficient processing and classification of wave energy spectrum data with a distributed pipeline. Comput. Res. Model. 7(3), 517–520 (2015). http://crm-en.ics.org.ru/journal/article/2301/

    Google Scholar 

  15. Gankevich, I., Tipikin, Y., Degtyarev, A., Korkhov, V.: Novel approaches for distributing workload on commodity computer systems. In: Gervasi, O., Murgante, B., Misra, S., Gavrilova, M.L., Rocha, A.M.A.C., Torre, C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2015. LNCS, vol. 9158, pp. 259–271. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  16. Gankevich, I., Tipikin, Y., Gaiduchok, V.: Subordination: cluster management without distributed consensus. In: International Conference on High Performance Computing & Simulation (HPCS), pp. 639–642. IEEE (2015)

    Google Scholar 

  17. Hewitt, C., Bishop, P., Steiger, R.: A universal modular actor formalism for artificial intelligence. In: Proceedings of the 3rd International Joint Conference on Artificial Intelligence, pp. 235–245. Morgan Kaufmann Publishers Inc. (1973)

    Google Scholar 

  18. Hinden, R., et al.: Virtual router redundancy protocol (vrrp); rfc3768. txt. IETF Standard, Internet Engineering Task Force, IETF, CH, pp. 0000–0003 (2004)

    Google Scholar 

  19. Islam, M., Huang, A.K., Battisha, M., Chiang, M., Srinivasan, S., Peters, C., Neumann, A., Abdelnur, A.: Oozie: towards a scalable workflow management system for Hadoop. In: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, p. 4. ACM (2012)

    Google Scholar 

  20. Knight, S., Weaver, D., Whipple, D., Hinden, R., Mitzel, D., Hunt, P., Higginson, P., Shand, M., Lindem, A.: Rfc2338. Virtual Router Redundancy Protocol (1998)

    Google Scholar 

  21. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  22. Murthy, A.C., Douglas, C., Konar, M., OMalley, O., Radia, S., Agarwal, S., Vinod, K.V.: Architecture of next generation apache hadoop mapreduce framework. Apache Jira (2011)

    Google Scholar 

  23. Nadas, S.: Rfc 5798: Virtual router redundancy protocol (vrrp) version 3 for ipv4 and ipv6. Internet Engineering Task Force (IETF) (2010)

    Google Scholar 

  24. NDBC directional wave stations. http://www.ndbc.noaa.gov/dwa.shtml

  25. Okorafor, E., Patrick, M.K.: Availability of jobtracker machine in hadoop/mapreduce zookeeper coordinated clusters. Adv. Comput. Int. J. (ACIJ) 3(3), 19–30 (2012)

    Article  Google Scholar 

  26. Ostrovsky, D., Rodenski, Y., Haji, M.: Pro Couchbase Server. Apress, Berkeley (2015)

    Book  Google Scholar 

  27. Uhlemann, K., Engelmann, C., Scott, S.L.: Joshua: symmetric active/active replication for highly available hpc job and resource management. In: 2006 IEEE International Conference on Cluster Computing, pp. 1–10. IEEE (2006)

    Google Scholar 

  28. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  29. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)

    Google Scholar 

Download references

Acknowledgements

The research was carried out using computational resources of Resource Centre “Computational Centre of Saint Petersburg State University” (T-EDGE96 HPC-0011828-001) within frameworks of grants of Russian Foundation for Basic Research (projects no. 16-07-01111, 16-07-00886, 16-07-01113) and Saint Petersburg State University (project no. 0.37.155.2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Gankevich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gankevich, I., Tipikin, Y., Korkhov, V., Gaiduchok, V., Degtyarev, A., Bogdanov, A. (2016). Factory: Master Node High-Availability for Big Data Applications and Beyond. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2016. ICCSA 2016. Lecture Notes in Computer Science(), vol 9787. Springer, Cham. https://doi.org/10.1007/978-3-319-42108-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42108-7_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42107-0

  • Online ISBN: 978-3-319-42108-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics