Skip to main content

Fault Characterization and Mitigation Strategies in Desktop Cloud Systems

Part of the Communications in Computer and Information Science book series (CCIS,volume 979)

Abstract

Desktop cloud platforms, such as UnaCloud and CernVM, run clusters of virtual machines taking advantage of idle resources on desktop computers. These platforms execute virtual machines along with the applications started by the users in those desktops. Unfortunately, although the use of computer resources is better, desktop user actions, such as turning off the computer or running certain applications may conflict with the virtual machines. Desktop clouds commonly run applications based on technologies such as Tensorflow or Hadoop that rely on master-worker architectures and are sensitive to failures in specific nodes. To support these new types of applications, it is important to understand which failures may interrupt the execution of these clusters, what faults may cause some errors and which strategies can be used to mitigate or tolerate them. Using the UnaCloud platform as a case study, this paper presents an analysis of (1) the failures that may occur in desktop clouds and (2) the mitigation strategies available to improve dependability.

Keywords

  • Desktop clouds
  • Dependability
  • Reliability
  • Fault analysis
  • Fault tolerance

This work has been partially carried out with resources provided by the CYTED cofunded Thematic Network RICAP (517RT0529).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-16205-4_24
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-16205-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Notes

  1. 1.

    https://sistemasproyectos.uniandes.edu.co/iniciativas/unacloud/.

  2. 2.

    https://cernvm.cern.ch/portal/publications.

  3. 3.

    https://www.tensorflow.org/.

  4. 4.

    https://hadoop.apache.org/.

  5. 5.

    http://www.uniandes.edu.co.

References

  1. Alwabel, A., Walters, R., Wills, G.: A view at desktop clouds. In: International Workshop on Emerging Software as a Service and Analytics (ESaaSA 2014), pp. 55–61 (2014)

    Google Scholar 

  2. Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1(1), 11–33 (2004)

    CrossRef  Google Scholar 

  3. Bakken, D.E., Schlichting, R.D.: Tolerating failures in the bag-of-tasks programming paradigm. In: 21st International Symposium on Fault-Tolerant Computing, FTCS-21, pp. 248–255. IEEE (1991)

    Google Scholar 

  4. Cunsolo, V., Distefano, S., Puliafito, A., Scarpa, M.: Volunteer computing and desktop cloud: the Cloud@Home paradigm. In: 8th IEEE International Symposium on Network Computing and Applications, NCA 2009, pp. 134–139 (2009)

    Google Scholar 

  5. Jonsson, E.: An integrated framework for security and dependability. In: The 1998 Workshop on New Security Paradigms, NSPW 1998, pp. 22–29 (1998)

    Google Scholar 

  6. Jonsson, E.: Towards an integrated conceptual model of security and dependability. In: The First International Conference on Availability, Reliability and Security, ARES 2006, 8 pp. IEEE (2006)

    Google Scholar 

  7. Kangarlou-Haghighi, A.: Improving the reliability and performance of virtual cloud infrastructures. Ph.D. thesis, Purdue University (2011)

    Google Scholar 

  8. Kondo, D.: Scheduling task parallel applications for rapid turnaround on desktop grids. Ph.D. thesis, University of California, San Diego (2005)

    Google Scholar 

  9. Laprie, J.C.: Dependability: basic concepts and terminology. In: Laprie, J.C. (ed.) Dependability Basic Concepts and Terminology. Dependable Computing and Fault-Tolerant Systems, vol. 5. Springer, Vienna (1992). https://doi.org/10.1007/978-3-7091-9170-5_1

    CrossRef  MATH  Google Scholar 

  10. Prasad, D., McDermid, J., Wand, I.: Dependability terminology: similarities and differences. In: 10th Annual Conference on Computer Assurance, COMPASS 1995, pp. 213–221. IEEE (1995)

    Google Scholar 

  11. Rosales, E., Castro, H., Villamizar, M.: UnaCloud: opportunistic cloud computing infrastructure as a service. In: Cloud Computing, pp. 187–194 (2011)

    Google Scholar 

  12. Sarmenta, L.F.G.: Volunteer computing. Ph.D. thesis, Massachusetts Institute of Technology (2001)

    Google Scholar 

  13. Segal, B., et al.: LHC cloud computing with CernVM. PoS, p. 004 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos E. Gómez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Gómez, C.E., Chavarriaga, J., Castro, H.E. (2019). Fault Characterization and Mitigation Strategies in Desktop Cloud Systems. In: Meneses, E., Castro, H., Barrios Hernández, C., Ramos-Pollan, R. (eds) High Performance Computing. CARLA 2018. Communications in Computer and Information Science, vol 979. Springer, Cham. https://doi.org/10.1007/978-3-030-16205-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16205-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16204-7

  • Online ISBN: 978-3-030-16205-4

  • eBook Packages: Computer ScienceComputer Science (R0)