Model Based Approach for Autonomic Availability Management

  • Kesari Mishra
  • Kishor S. Trivedi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4328)


As increasingly complex computer systems have started playing a controlling role in all aspects of modern life, system availability and associated downtime of technical systems have acquired critical importance. Losses due to system downtime have risen manifold and become wide-ranging. Even though the component level availability of hardware and software has increased considerably, system wide availability still needs improvement as the heterogeneity of components and the complexity of interconnections has gone up considerably too. As systems become more interconnected and diverse, architects are less able to anticipate and design for every interaction among components, leaving such issues to be dealt with at runtime. Therefore, in this paper, we propose an approach for autonomic management of system availability, which provides real-time evaluation, monitoring and management of the availability of systems in critical applications. A hybrid approach is used where analytic models provide the behavioral abstraction of components/subsystems, their interconnections and dependencies and statistical inference is applied on the data from real time monitoring of those components and subsystems, to parameterize the system availability model. The model is solved online (that is, in real time) so that at any instant of time, both the point as well as the interval estimates of the overall system availability are obtained by propagating the point and the interval estimates of each of the input parameters, through the system model. The online monitoring and estimation of system availability can then lead to adaptive online control of system availability.


Monitoring Station System Availability Fault Tree Simple Network Management Protocol Availability Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Trivedi, K.S.: Probability and Statistics with Reliability, Queuing and Computer Science Applications. John Wiley & Sons, New York (2001)Google Scholar
  2. 2.
    Sahner, R.A., Trivedi, K.S., Puliafito, A.: Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic Publishers, Dordrecht (1996)MATHGoogle Scholar
  3. 3.
    Leemis, L.M.: Reliability. Probabilistic Models and Statistical Methods. Prentice Hall, New Jersey (1995) MATHGoogle Scholar
  4. 4.
    Tang, D., Iyer, R.K.: Dependability Measurement and Modeling of a Multicomputer System. IEEE Transactions on Computers 42(1), 62–75 (1993)CrossRefGoogle Scholar
  5. 5.
    Malhotra, M., Trivedi, K.S.: Dependability Modeling Using Petri Net Based Models. IEEE Transactions on Reliability 44(3), 428–440 (1995)CrossRefGoogle Scholar
  6. 6.
    Cristian, F., Dancey, B., Dehn, J.: Fault Tolerance in Air Traffic Control Systems. ACM Transactions on Computer Systems 14, 265–286 (1996)CrossRefGoogle Scholar
  7. 7.
    Morgan, P., Gaffney, P., Melody, J., Condon, M., Hayden, M.: System Availability Monitoring. IEEE Transactions on Reliability 39(4), 480–485 (1990)CrossRefGoogle Scholar
  8. 8.
    Ibe, O., Howe, R., Trivedi, K.S.: Approximate availability analysis of VAXCluster systems. IEEE Transactions on Reliability R-38(1), 146–152 (1989)CrossRefGoogle Scholar
  9. 9.
    Blake, J.T., Trivedi, K.S.: Reliability analysis of interconnection networks using hierarchical composition. IEEE Transactions on Reliability 32, 111–120 (1989)CrossRefGoogle Scholar
  10. 10.
    Albin, S.L., Chao, S.: Preventive Replacement in Systems with Dependent Components. IEEE Transactions on Reliability 41(2), 230–238 (1992)MATHCrossRefGoogle Scholar
  11. 11.
    Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. Computer magazine, 41–50 (January 2003)Google Scholar
  12. 12.
    Li, L., Vaidyanathan, K., Trivedi, K.S.: An Approach for Estimation of Software Aging in a Web Server. In: Proc. of Intl. Symposium on Empirical Software Engineering (ISESE 2002) (2002)Google Scholar
  13. 13.
    Garzia, M.R.: Assessing the Reliability of Windows Servers. In: Proc. of Dependable Systems and Networks (DSN 2002) (2003)Google Scholar
  14. 14.
    Hunter, S.W., Smith, W.E.: Availability modeling and analysis of a two node cluster. In: Proc. of 5th Int. Conf. On Information Systems, Analysis and Synthesis (1999)Google Scholar
  15. 15.
    Yin, L., Smith, M.A.J., Trivedi, K.S.: Uncertainty Analysis in Reliability Modeling. In: Proc. of the Annual Reliability and Maintainability Symposium (RAMS 2001) (2001)Google Scholar
  16. 16.
    Dohi, T., Popstojanova, K.-G., Trivedi, K.S.: Statistical Non-Parametric Algorithms to estimate the Optimal Software Rejuvenation Schedule. In: Proc. of Pacific Rim Intl. Symposium on Dependable Computing (PRDC) (2000)Google Scholar
  17. 17.
    Garg, S., Huang, Y., Kintala, C.M.R., Trivedi, K.S., Yajnik, S.: Performance and Reliability Evaluation of Passive replication Schemes in Application Level fault Tolerance. In: Proc. of 29th Annual Intl. Symposium on Fault Tolerant Computing (FTCS) (1999)Google Scholar
  18. 18.
    Chen, D.Y., Trivedi, K.S.: Analysis of Periodic Preventive Maintenance with General System Failure Distribution. In: Pacific Rim Intl. Symposium on Dependable Computing (PRDC) (2001)Google Scholar
  19. 19.
    Long, D., Muir, a., Golding, R.: A Longitudinal Survey of Internet Host Reliability. In: Proc. of the 14th Symposium on Reliable Distributed Systems (1995)Google Scholar
  20. 20.
    Garg, S., Puliafito, A., Telek, M., Trivedi, K.S.: Analysis of Software Rejuvenation using Markov Regenerative Stochastic Petri Net. In: Proc. of Intl. Symposium on Software Reliability Engineering (ISSRE) (1995)Google Scholar
  21. 21.
    Fricks, R.M., Ketcham, M.: Steady State Availability Estimation Using Field Failure Data. In: Proc. Annual Reliability and Maintainability Symposium (RAMS 2004) (2004)Google Scholar
  22. 22.
    Sathaye, A., Ramani, S., Trivedi, K.S.: Availability Models in Practice. In: Proc. of Intl. Workshop on Fault-Tolerant Control and Computing (FTCC-1) (2000)Google Scholar
  23. 23.
    Logothetis, D., Trivedi, K.: Time-dependent behavior of redundant systems with deterministic repair. In: Proc. of 2nd International Workshop on the Numerical Solution of Markov Chains (1995)Google Scholar
  24. 24.
    Hughes-Fenchel, G.: A Flexible Clustered Approach to High Availability. In: 27th Int. Symp. on Fault-Tolerant Computing (FTCS-27) (1997)Google Scholar
  25. 25.
  26. 26.
    Sun SNMP Management Agent Guide for Sun Fire B1600,
  27. 27.
  28. 28.
    Hardware Monitoring by lm_sensors,
  29. 29.
    Windows 2000 Cluster Service Architecture,
  30. 30.
  31. 31.
    IBM Research — Autonomic Computing,
  32. 32.
  33. 33.
    Mosberger, D., Jin, T.: httperf—A Tool for Measuring Web Server Performance,

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kesari Mishra
    • 1
  • Kishor S. Trivedi
    • 1
  1. 1.Dept. of Electrical and Computer EngineeringDuke UniversityDurhamUSA

Personalised recommendations