Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems

  • Mohammad Ahmad Munawar
  • Paul A. S. Ward
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4742)


Self-managing systems require continuous monitoring to ensure correct operation. Detailed monitoring is often too costly to use in production. An alternative is adaptive monitoring, whereby monitoring is kept to a minimal level while the system behaves as expected, and the monitoring level is increased if a problem is suspected. To enable such an approach, we must model the system, both at a minimal level to ensure correct operation, and at a detailed level, to diagnose faulty components. To avoid the complexity of developing an explicit model based on the system structure, we employ simple statistical techniques to identify relationships in the monitored data. These relationships are used to characterize normal operation and identify problematic areas.

We develop and evaluate a prototype for the adaptive monitoring of J2EE applications. We experiment with 29 different fault scenarios of three general types, and show that we are able to detect the presence of faults in 80% of cases, where all but one instance of non-detection is attributable to a single fault type. We are able to shortlist the faulty component in 65% of cases where anomalies are observed.


Self-managing systems adaptive monitoring root-cause analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pertet, S., Narasimhan, P.: Causes of failure in web applications. Technical Report CMU-PDL-05-109, Carnegie Mellon University Parallel Data Lab (December 2005)Google Scholar
  2. 2.
    Hecker, D.E.: Occupational employment projections to 2014. Monthly Labor Review, pp. 70–101 (November 2005)Google Scholar
  3. 3.
    Topal, B., Ogle, D., Pierson, D., Thoensen, J., Sweitzer, J., Chow, M., Hoffmann, M.A., Durham, P., Telford, R., Sheth, S., Studwell, T.: Autonomic problem determination: A first step toward self-healing computing systems. Technical report, IBM (2003)Google Scholar
  4. 4.
    Fox, A., Patterson, D.: Self-repairing computers. Scientific American (June 2003)Google Scholar
  5. 5.
  6. 6.
    Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE Computer 36(1), 41–50 (2003)Google Scholar
  7. 7.
    Munawar, M.A., Ward, P.A.: Adaptive monitoring in enterprise software systems. In: Tackling Computer Systems Problems with Machine Learning Techniques (SysML) (June 2006)Google Scholar
  8. 8.
    Microsoft Corp.: .NET Platform
  9. 9.
    Sun Microsystems, Inc.: Java 2 platform enterprise edition, v 1.4 API specification,
  10. 10.
    Sun Microsystems Inc.: JMX — Java Management Extensions,
  11. 11.
    Dmitriev, M.: Profiling java applications using code hotswapping and dynamic call graph revelation. In: International Workshop on Software and Performance, pp. 139–150 (2004)Google Scholar
  12. 12.
    Mirgorodskiy, A.V., Miller, B.P.: Autonomous analysis of interactive systems with self-propelled instrumentation. In: Multimedia Computing and Networking (2005)Google Scholar
  13. 13.
    Munawar, M.A., Quan, K., Ward, P.A.: Interaction analysis of heterogeneous monitoring data for autonomic problem determination. In: The IEEE International Symposium on Ubisafe Computing. IEEE Computer Society Press, Los Alamitos (2007)Google Scholar
  14. 14.
    Appleby, K., Faik, J., Kar, G., Saile, A., Agarwal, M., Neogi, A.: Threshold management for problem determination in transaction based e-commerce systems. In: Integrated Network Management, pp. 733–746 (May 2005)Google Scholar
  15. 15.
    Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.: Correlating instrumentation data to system states: A building block for automated diagnosis and control. In: Symposium on Operating Systems Design and Implementation (OSDI), pp. 231–244 (December 2004)Google Scholar
  16. 16.
    Kiciman, E., Armando, F.: Detecting application-level failures in component-based internet services. IEEE Transactions on Neural Networks 16(5), 1027–1041 (2005)CrossRefGoogle Scholar
  17. 17.
    Brown, A., Kar, G., Keller, A.: An active approach to characterizing dynamic dependencies for problem determination in a distributed environment. Integrated Network Management, 377–390 (May 2001)Google Scholar
  18. 18.
    Hauswirth, M., Sweeney, P.F., Diwan, A., Hind, M.: Vertical profiling: Understanding the behavior of object-oriented applications. Object-Oriented Programming, Systems, Languages, and Applications (2004)Google Scholar
  19. 19.
    Agarwal, M., Anerousis, N., Gupta, M., Mann, V., Mummert, L., Sachindran, N.: Problem determination in enterprise middleware systems using change point correlation of time series data. Network Operations and Management Symposium (April 2006)Google Scholar
  20. 20.
    Jiang, G., Chen, H., Yoshihira, K.: Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Transactions Dependable and Secure Computing 3(4), 312–326 (2006)CrossRefGoogle Scholar
  21. 21.
    Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics: Identifying Influential Data and Source of Collinearity. John Wiley and Sons, New York (1980)Google Scholar
  22. 22.
    IBM Corp.: Trade 6 Performance Benchmark Sample for WebSphere Application Server,
  23. 23.
    Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Mahwah (1988)zbMATHGoogle Scholar
  24. 24.
    SAS Institute Inc.: SAS OnlineDoc Version8,
  25. 25.
    Hellerstein, J.L., Zhang, F., Shahabuddin, P.: Characterizing normal operation of a web server: Application to workload forecasting and problem detection. In: Proceedings of Computer Measurement Group (December 1998)Google Scholar
  26. 26.
    Munawar, M.A., Ward, P.A.: A comparative study of pairwise regression techniques for problem determination. Technical Report 2007-15, ECE, University of Waterloo (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Mohammad Ahmad Munawar
    • 1
  • Paul A. S. Ward
    • 1
  1. 1.Shoshin Distributed Systems Group, Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1Canada

Personalised recommendations