Abstract
We discuss availability aspects of large software-based systems. We classify faults into Bohrbugs, Mandelbugs and aging-related bugs, and then examine mitigation methods for the last two bug types. We also consider quantitative approaches to availability assurance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Holzmann, G.J.: Conquering complexity. IEEE Computer, Los Alamitos (2007)
Grottke, M., Trivedi, K.S.: Fighting bugs: Remove, retry, replicate and rejuvenate. IEEE Comp. 40, 107–109 (2007)
Grottke, M., Nikora, A., Trivedi, K.S.: Preliminary results from the NASA/JPL investigation - Classifying Software Faults to Improve Fault Detection Effectiveness (2007)
Garg, S., van Moorsel, A., Vaidyanathan, K., Trivedi, K.S.: A methodology for detection and estimation of software aging. In: 9th Int’l Symp. on Software Reliability Engineering, pp. 283–292 (1998)
Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S.: Analysis of software aging in a web server. IEEE Transactions on Reliability 55, 411–420 (2006)
Marshall, E.: Fatal error: how Patriot overlooked a Scud. Science 255, 1347 (1992)
Smith, W.E., Trivedi, K.S., Tomek, L., Ackeret, J.: Availability analysis of multi-component blade server systems. IBM Systems Journal (to appear, 2008)
Trivedi, K.S., Vasireddy, R., Trindade, D., Nathan, S., Castro, R.: Modeling high availability systems. In: Pacific Rim Dependability Conference (2006)
Trivedi, K.S., Wang, D., Hunt, J., Rindos, A., Peyravian, M., Pulito, B.: IBM SIP/SLEE cluster reliability model. In: Globecom 2007, D&D Forum, Washington (2007)
Sahner, R.A., Trivedi, K.S., Puliafito, A.: Performance and Reliability Analysis of Computer Systems. Kluwer Academic Press, Dordrecht (1996)
Trivedi, K.S.: Probability & Statistics with Reliability, Queueing and Computer Science Applications, 2nd edn. John Wiley, New York (2001)
Lanus, M., Yin, L., Trivedi, K.S.: Hierarchical composition and aggregation of state-based availability and performability models. IEEE Transactions on Reliability, 44–52 (2003)
Sato, N., Nakamura, H., Trivedi, K.S.: Detecting performance and reliability bottlenecks of composite web services. In: ICSOC (2007)
Wang, D., Trivedi, K.S.: Modeling user-perceived service availability. In: Malek, M., Nett, E., Suri, N. (eds.) ISAS 2005. LNCS, vol. 3694, Springer, Heidelberg (2005)
Mendiratta, V.B., Souza, J.M., Zimmerman, G.: Using software failure data for availability evaluation. In: GLOBECOM 2007, Washington (2007)
Garzia, M.: Assessing the Reliability of Windows Servers. In: Int’l Conf. Dependable Systems and Networks (2003)
Haberkorn, M., Trivedi, K.S.: Availability monitor for a software based system. In: HASE, Dallas (2007)
Garg, S., Huang, Y., Kintala, C.M.R., Trivedi, K.S., Yajnik, S.: Performance and reliability evaluation of passive replication schemes in application level fault tolerance. In: 29th Annual Int’l Symp. on Fault Tolerant Computing, Wisconsin, pp. 15–18 (1999)
Chen, D., et al.: Reliability and availability analysis for the JPL remote exploration and experimentation system. In: Int’l Conf. Dependable Systems and Networks, Washington (2002)
Vaidyanathan, K., Harper, R.E., Hunter, S.W., Trivedi, K.S.: Analysis and implementation of software rejuvenation in cluster systems. In: ACM SIGMETRICS (2001)
Mainkar, V., Trivedi, K.S.: Sufficient conditions for existence of a fixed point in stochastic reward net-based iterative methods. IEEE Transactions on Software Engineering 22, 640–653 (1996)
Huang, Y., Kintala, C., Kolettis, N., Fulton, N.: Software rejuvenation: analysis, module and applications. In: 25th Int’l Symp. on Fault-Tolerant Computing, pp. 381–390 (1995)
Matias Jr., R., Freitas, P.J.F.: An experimental study on software aging and rejuvenation in web servers. In: 30th IEEE Annual Int’l Computer Software and Applications Conference, Chicago, pp. 189–196 (2006)
Tai, A., Chau, S., Alkalaj, L., Hect, H.: On-board preventive maintenance: a design-oriented analytic study for long-life applications. J. Perf. Evaluation 35, 215–232 (1999)
Castelli, V., Harper, R.E., Heidelberger, P., Hunter, S.W., Trivedi, K.S., Vaidyanathan, K., Zeggert, W.P.: Proactive management of software aging. IBM Journal of Research and Development 45, 311–332 (2001)
Kourai, K., Chiba, S.: A fast rejuvenation technique for server consolidation with virtual machines. In: Int’l Conf. on Dependable Systems and Networks, pp. 245–255 (2007)
Xie, W., Hong, Y., Trivedi, K.S.: Analysis of a two-level software rejuvenation policy. Reliability Engineering and System Safety 87, 13–22 (2005)
Vaidyanathan, K., Trivedi, K.S.: A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing 2, 124–137 (2005)
Dohi, T., Goseva-Popstojanova, K., Trivedi, K.S.: Statistical Non-Parametric Algorithms to Estimate the Optimal Software Rejuvenation Schedule. In: 2000 Pacific Rim Intl. Symp. on Dependable Computing, Los Angeles, pp. 77–84 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Trivedi, K. et al. (2008). Achieving and Assuring High Availability. In: Nanya, T., Maruyama, F., Pataricza, A., Malek, M. (eds) Service Availability. ISAS 2008. Lecture Notes in Computer Science, vol 5017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68129-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-68129-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68128-1
Online ISBN: 978-3-540-68129-8
eBook Packages: Computer ScienceComputer Science (R0)