Novel Approaches for Distributing Workload on Commodity Computer Systems

  • Ivan Gankevich
  • Yuri Tipikin
  • Alexander Degtyarev
  • Vladimir Korkhov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9158)

Abstract

Efficient management of a distributed system is a common problem for university’s and commercial computer centres, and handling node failures is a major aspect of it. Failures which are rare in a small commodity cluster, at large scale become common, and there should be a way to overcome them without restarting all parallel processes of an application. The efficiency of existing methods can be improved by forming a hierarchy of distributed processes. That way only lower levels of the hierarchy need to be restarted in case of a leaf node failure, and only root node needs special treatment. Process hierarchy changes in real time and the workload is dynamically rebalanced across online nodes. This approach makes it possible to implement efficient partial restart of a parallel application, and transactional behaviour for computer centre service tasks.

Keywords

Long-lived transactions Distributed pipeline Node discovery Software engineering Distributed computing Cluster computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andrianov, S., Degtyarev, A.: Parallel and distributed computations. Saint Petersburg State University (2007). (in Russian)Google Scholar
  2. 2.
    Armstrong, J.: Making reliable distributed systems in the presence of software errors. PhD thesis, The Royal Institute of Technology Stockholm, Sweden (2003)Google Scholar
  3. 3.
    Degtyarev, A.: High performance computer technologies in shipbuilding. In: Birk, L., Harries, S. (eds.) OPTIMISTIC – optimization in marine design. Mensch & Buch Verlag, BerlinGoogle Scholar
  4. 4.
    Handigol, N., Heller, B., Jeyakumar, V., Lantz, B., McKeown, N.: Reproducible network experiments using container-based emulation. In: Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, pp. 253–264. ACM (2012)Google Scholar
  5. 5.
    Heller, B.: Reproducible Network Research with High-fidelity Emulation. PhD thesis, Stanford University (2013)Google Scholar
  6. 6.
    Kochman, S., Wojciechowski, P.T., Kmieciak, M.: Batched transactions for RESTful web services. In: Harth, A., Koch, N. (eds.) ICWE 2011. LNCS, vol. 7059, pp. 86–98. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  7. 7.
    Lantz, B., Heller, B., McKeown, N.: A network in a laptop: rapid prototyping for software-defined networks. In: Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, p. 19. ACM (2010)Google Scholar
  8. 8.
    Lifflander, J., Meneses, E., Menon, H., Miller, P., Krishnamoorthy, S., Kalé, L.V.: Scalable replay with partial-order dependencies for message-logging fault tolerance. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 19–28. IEEE (2014)Google Scholar
  9. 9.
    Soshmina, I., Bogdanov, A.: Using GRID technologies for computations. Saint Petersburg State University Bulletin (Physics and Chemistry) 3, 130–137 (2007). (in Russian)Google Scholar
  10. 10.
    Tel, G.: Introduction to distributed algorithms. Cambridge University Press (2000)Google Scholar
  11. 11.
    Wilde, E., Pautasso, C.: REST: from research to practice. Springer Science & Business Media (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ivan Gankevich
    • 1
  • Yuri Tipikin
    • 1
  • Alexander Degtyarev
    • 1
  • Vladimir Korkhov
    • 1
  1. 1.Saint Petersburg State UniversitySaint PetersburgRussia

Personalised recommendations