Abstract
In order to reduce mean time to recovery (MTTR) in heterogeneous enterprise environments it should be possible to easily and quickly determine the root cause of a problem detected at a higher level, e.g. through response time violation of a transaction category, and resolve it. Many problem determination applications use a component dependency graph to pinpoint the root cause. However, such graphs are often manually constructed. This paper introduces a simple non-intrusive technique based on mining of existing runtime monitored data, to construct a dynamic dependency graph between the components of an enterprise environment. The graph is traversed to identify nodes that are the cause of response time related problems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aman, J., Eilert, C.K., Emmes, D., Yocom, P., Dillenberger, D.: Adaptive Algorithms for managing a distributed data processing workload. IBM Systems Journal 36(2) (1997)
Systems Management: Application Response Measurement (ARM), Open-Group Technical Standard C807, UK ISBN 1-85912-211-6 (July 1998), http://www.opengroup.org/products/publications/catalog/c807.htm
Bagchi, S., Kar, G., Hellerstein, J.L.: Dependency Analysis in Distributed Systems using Fault Injection: Application to Problem Determination in an ecommerce Environment. In: 12th International Workshop on Distributed Systems: Operations & Management (2001)
Brown, A., Kar, G., Keller, A.: An Active Approach to Characterizing Dynamic Dependencies for Problem Determination in Distributed Environment. In: International IFIP/IEEE Symposium on Integrated Network Management (2001)
Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.: Pinpoint: Problem Determination in Large, Dynamic Internet Service. In: International Conference on Dependable Systems and Networks (DSN 2002) (June 2002)
Choi, J., Choi, M., Lee, S.: An Alarm Correlation and Fault Identification Scheme Based on OSI Managed Object Classes. In: 1999 IEEE International Conference onCommunications, Vancouver, BC, Canada, pp. 1547–1551 (1999)
Ensel, C.: New Approach for Automated Generation of Service Dependency Models. In: Second Latin American Network Operation and Management Symposium, LANOMS (2001)
Farrell, J.A., Kreger, H.: Web services management approaches. IBM Systems Journal 41(2) (2002)
Gruschke, B.: Integrated Event Management: Event Correlation using Dependency Graphs. In: Proceedings of 9th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 1998) (October 1998)
Gupta, M., Neogi, A., Agarwal, M.K., Kar, G.: Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination, IBM Research Report, RI03010 (2003)
Hasselmeyer, P.: Managing Dynamic Service Dependencies. In: Proceedings of 12th International Workshop on Distributed Systems: Operations & Management (DSOM) (2001)
Hellerstein, J.L., Ma, S.: Mining Event Data for Actionable Patterns. The Computer Measurement Group (2000)
Java 2 Platform, Enterprise Edition, http://java.sun.com/j2ee
Katchabow, M.J., et al.: Making Distributed Applications Manageable Through Instrumentation. Journal of Systems and Software 45 (1999)
Katker, S., Paterok, M.: Fault Isolation and Event Correlation for Integrated Fault Management, Integrated Network Management V. Chapman and Hall, Boca Raton (1997)
Keller, A., Kar, G.: Classification and Computation of Dependencies for Distributed Management. In: 5th IEEE Symposium on Computers and Communications (ISCC) (July 2000)
Kon, F., Campbell, R.H.: Dependence Management in Component-Based Distributed Systems. IEEE Concurrency 8(1), 26–36 (2000)
Steinder, M., Sethi, A.S.: Multi-layer Fault Localization using Probabilistic Inference in Bipartite Dependency Graphs, Technical Report 2001-02, CIS Dept., Univ. of Delaware (February 2001)
Thoenen, D., Riosa, J., Hellerstein, J.L.: Event Relationship Networks: A Framework for Action Oriented Analysis for Event Management. In: Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, Seattle, WA, pp. 593–606. IEEE, New York (2001)
TPC-W Wisconsin website, http://www.ece.wisc.edu/~pharm/tpcw.shtml
Yemini, S., Kliger, S., et al.: High Speed and Robust Event Correlation. IEEE Communications Magazine 34(5), 82–90 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gupta, M., Neogi, A., Agarwal, M.K., Kar, G. (2003). Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination. In: Brunner, M., Keller, A. (eds) Self-Managing Distributed Systems. DSOM 2003. Lecture Notes in Computer Science, vol 2867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39671-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-39671-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20314-8
Online ISBN: 978-3-540-39671-0
eBook Packages: Springer Book Archive