Skip to main content

Achieving and Assuring High Availability

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5017))

Abstract

We discuss availability aspects of large software-based systems. We classify faults into Bohrbugs, Mandelbugs and aging-related bugs, and then examine mitigation methods for the last two bug types. We also consider quantitative approaches to availability assurance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Holzmann, G.J.: Conquering complexity. IEEE Computer, Los Alamitos (2007)

    Google Scholar 

  2. Grottke, M., Trivedi, K.S.: Fighting bugs: Remove, retry, replicate and rejuvenate. IEEE Comp. 40, 107–109 (2007)

    Article  Google Scholar 

  3. Grottke, M., Nikora, A., Trivedi, K.S.: Preliminary results from the NASA/JPL investigation - Classifying Software Faults to Improve Fault Detection Effectiveness (2007)

    Google Scholar 

  4. Garg, S., van Moorsel, A., Vaidyanathan, K., Trivedi, K.S.: A methodology for detection and estimation of software aging. In: 9th Int’l Symp. on Software Reliability Engineering, pp. 283–292 (1998)

    Google Scholar 

  5. Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S.: Analysis of software aging in a web server. IEEE Transactions on Reliability 55, 411–420 (2006)

    Article  Google Scholar 

  6. Marshall, E.: Fatal error: how Patriot overlooked a Scud. Science 255, 1347 (1992)

    Article  Google Scholar 

  7. Smith, W.E., Trivedi, K.S., Tomek, L., Ackeret, J.: Availability analysis of multi-component blade server systems. IBM Systems Journal (to appear, 2008)

    Google Scholar 

  8. Trivedi, K.S., Vasireddy, R., Trindade, D., Nathan, S., Castro, R.: Modeling high availability systems. In: Pacific Rim Dependability Conference (2006)

    Google Scholar 

  9. Trivedi, K.S., Wang, D., Hunt, J., Rindos, A., Peyravian, M., Pulito, B.: IBM SIP/SLEE cluster reliability model. In: Globecom 2007, D&D Forum, Washington (2007)

    Google Scholar 

  10. Sahner, R.A., Trivedi, K.S., Puliafito, A.: Performance and Reliability Analysis of Computer Systems. Kluwer Academic Press, Dordrecht (1996)

    MATH  Google Scholar 

  11. Trivedi, K.S.: Probability & Statistics with Reliability, Queueing and Computer Science Applications, 2nd edn. John Wiley, New York (2001)

    Google Scholar 

  12. Lanus, M., Yin, L., Trivedi, K.S.: Hierarchical composition and aggregation of state-based availability and performability models. IEEE Transactions on Reliability, 44–52 (2003)

    Google Scholar 

  13. Sato, N., Nakamura, H., Trivedi, K.S.: Detecting performance and reliability bottlenecks of composite web services. In: ICSOC (2007)

    Google Scholar 

  14. Wang, D., Trivedi, K.S.: Modeling user-perceived service availability. In: Malek, M., Nett, E., Suri, N. (eds.) ISAS 2005. LNCS, vol. 3694, Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Mendiratta, V.B., Souza, J.M., Zimmerman, G.: Using software failure data for availability evaluation. In: GLOBECOM 2007, Washington (2007)

    Google Scholar 

  16. Garzia, M.: Assessing the Reliability of Windows Servers. In: Int’l Conf. Dependable Systems and Networks (2003)

    Google Scholar 

  17. Haberkorn, M., Trivedi, K.S.: Availability monitor for a software based system. In: HASE, Dallas (2007)

    Google Scholar 

  18. Garg, S., Huang, Y., Kintala, C.M.R., Trivedi, K.S., Yajnik, S.: Performance and reliability evaluation of passive replication schemes in application level fault tolerance. In: 29th Annual Int’l Symp. on Fault Tolerant Computing, Wisconsin, pp. 15–18 (1999)

    Google Scholar 

  19. Chen, D., et al.: Reliability and availability analysis for the JPL remote exploration and experimentation system. In: Int’l Conf. Dependable Systems and Networks, Washington (2002)

    Google Scholar 

  20. Vaidyanathan, K., Harper, R.E., Hunter, S.W., Trivedi, K.S.: Analysis and implementation of software rejuvenation in cluster systems. In: ACM SIGMETRICS (2001)

    Google Scholar 

  21. Mainkar, V., Trivedi, K.S.: Sufficient conditions for existence of a fixed point in stochastic reward net-based iterative methods. IEEE Transactions on Software Engineering 22, 640–653 (1996)

    Article  Google Scholar 

  22. Huang, Y., Kintala, C., Kolettis, N., Fulton, N.: Software rejuvenation: analysis, module and applications. In: 25th Int’l Symp. on Fault-Tolerant Computing, pp. 381–390 (1995)

    Google Scholar 

  23. Matias Jr., R., Freitas, P.J.F.: An experimental study on software aging and rejuvenation in web servers. In: 30th IEEE Annual Int’l Computer Software and Applications Conference, Chicago, pp. 189–196 (2006)

    Google Scholar 

  24. Tai, A., Chau, S., Alkalaj, L., Hect, H.: On-board preventive maintenance: a design-oriented analytic study for long-life applications. J. Perf. Evaluation 35, 215–232 (1999)

    Article  MATH  Google Scholar 

  25. Castelli, V., Harper, R.E., Heidelberger, P., Hunter, S.W., Trivedi, K.S., Vaidyanathan, K., Zeggert, W.P.: Proactive management of software aging. IBM Journal of Research and Development 45, 311–332 (2001)

    Article  Google Scholar 

  26. Kourai, K., Chiba, S.: A fast rejuvenation technique for server consolidation with virtual machines. In: Int’l Conf. on Dependable Systems and Networks, pp. 245–255 (2007)

    Google Scholar 

  27. Xie, W., Hong, Y., Trivedi, K.S.: Analysis of a two-level software rejuvenation policy. Reliability Engineering and System Safety 87, 13–22 (2005)

    Article  Google Scholar 

  28. Vaidyanathan, K., Trivedi, K.S.: A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing 2, 124–137 (2005)

    Article  Google Scholar 

  29. Dohi, T., Goseva-Popstojanova, K., Trivedi, K.S.: Statistical Non-Parametric Algorithms to Estimate the Optimal Software Rejuvenation Schedule. In: 2000 Pacific Rim Intl. Symp. on Dependable Computing, Los Angeles, pp. 77–84 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Nanya Fumihiro Maruyama András Pataricza Miroslaw Malek

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Trivedi, K. et al. (2008). Achieving and Assuring High Availability. In: Nanya, T., Maruyama, F., Pataricza, A., Malek, M. (eds) Service Availability. ISAS 2008. Lecture Notes in Computer Science, vol 5017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68129-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68129-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68128-1

  • Online ISBN: 978-3-540-68129-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics