How to Keep Your Head above Water While Detecting Errors

  • Ignacio Laguna
  • Fahad A. Arshad
  • David M. Grothe
  • Saurabh Bagchi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5896)


Today’s distributed systems need runtime error detection to catch errors arising from software bugs, hardware errors, or unexpected operating conditions. A prominent class of error detection techniques operates in a stateful manner, i.e., it keeps track of the state of the application being monitored and then matches state-based rules. Large-scale distributed applications generate a high volume of messages that can overwhelm the capacity of a stateful detection system. An existing approach to handle this is to randomly sample the messages and process a subset. However, this approach, leads to non-determinism with respect to the detection system’s view of what state the application is in. This in turn leads to degradation in the quality of detection. We present an intelligent sampling algorithm and a Hidden Markov Model (HMM)-based algorithm to select the messages that the detection system processes and determine the application states such that the non-determinism is minimized. We also present a mechanism for selectively triggering computationally intensive rules based on a light-weight mechanism to determine if the rule is likely to be flagged. We demonstrate the techniques in a detection system called Monitor applied to a J2EE multi-tier application. We empirically evaluate the performance of Monitor under different load conditions and error scenarios and compare it to a previous system called Pinpoint.


Stateful error detection High throughput distributed applications J2EE multi-tier systems Intelligent sampling Hidden Markov Model 


  1. 1.
    Kruegel, C., Valeur, F., Vigna, G., Kemmerer, R.: Stateful intrusion detection for high-speed network’s. In: IEEE Symp. on Security and Privacy (2002)Google Scholar
  2. 2.
    Jiang, W., Song, H., Dai, Y.: Real-time Intrusion Detection for High-speed Networks. Computers & Security 24(4), 287–294 (2005)CrossRefGoogle Scholar
  3. 3.
    Krishnamurthy, B., Sen, S., Zhang, Y., Chen, Y.: Sketch-based change detection: Methods, evaluation, and applications. In: IMC 2003 (2003)Google Scholar
  4. 4.
    Lakhina, A., Crovella, M., Diot, C.: Mining Anomalies Using Traffic Feature Distributions. ACM SIGCOMM Comput. Commun. Rev. 35(4) (October 2005)Google Scholar
  5. 5.
    Mai, J., Chuah, C., Sridharan, A., Ye, T., Zang, H.: Is Sampled Data Sufficient for Anomaly Detection? In: IMC 2006 (2006)Google Scholar
  6. 6.
    Barham, P., Donnelly, A., Isaacs, R., Mortier, R.: Using Magpie for Request Extraction and Workload Modeling. In: USENIX OSDI (2004)Google Scholar
  7. 7.
    Chen, M.Y., Accardi, A., Kiciman, E., Lloyd, J., Patterson, D., Fox, A., Brewer, E.: Path-based failure and evolution management. In: USENIX NSDI (2004)Google Scholar
  8. 8.
    Aguilera, M.K., Mogul, J.C., Wiener, J.L., Reynolds, P., Muthitacharoen, A.: Performance debugging for distributed systems of black boxes. In: ACM SOSP (2003)Google Scholar
  9. 9.
    Reynolds, P., Wiener, J.L., Mogul, J.C., Aguilera, M.K., Vahdat, A.: WAP5: black-box performance debugging for wide-area systems. In: WWW 2006 (2006)Google Scholar
  10. 10.
    Khanna, G., Varadharajan, P., Bagchi, S.: Automated online monitoring of distributed applications through external monitors. IEEE Trans. on Dependable and Secure Computing 3(2), 115–129 (2006)CrossRefGoogle Scholar
  11. 11.
    Khanna, G., Laguna, I., Arshad, F.A., Bagchi, S.: Stateful Detection in High Throughput Distributed Systems. In: SRDS 2007 (2007)Google Scholar
  12. 12.
    The Java EE 5 Tutorial (September 2007),
  13. 13.
    GlassFish: Open Source Application Server (2008),
  14. 14.
    Klein, D., Manning, C.D.: Parsing with treebank grammars. Assoc. for Computational Linguistics (2001)Google Scholar
  15. 15.
    Schuff, D.L., Pai, V.S.: Design Alternatives for a High-Performance Self-Securing Ethernet Network Interface. In: IPDPS 2007 (2007)Google Scholar
  16. 16.
    Kiciman, E., Fox, A.: Detecting application-level failures in component-based Internet services. IEEE Trans. Neural Networks 16(5), 1027–1041 (2005)CrossRefGoogle Scholar
  17. 17.
    Apache Tomcat: An Open Source JSP and Servlet Container,
  18. 18.
    TPC-W Benchmark,
  19. 19.
    Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S.: Analysis of Software Aging in a Web Server. IEEE Trans. on Reliability 55(3), 411–420 (2006)CrossRefGoogle Scholar
  20. 20.
    Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods, 2nd edn. (1998)Google Scholar
  21. 21.
    Williams, A.W., Pertet, S.M., Narasimhan, P.: Tiresias: Black-Box Failure Prediction in Distributed Systems. In: IPDPS (2007)Google Scholar
  22. 22.
    Laguna, I., Arshad, F.A., Grothe, D.M., Bagchi, S.: How To Keep Your Head Above Water While Detecting Errors. ECE Technical Reports, Purdue University,
  23. 23.
    Wu, Y.S., Bagchi, S., Singh, N., Wita, R.: Spam Detection in Voice-Over-IP Calls through Semi-Supervised Clustering. In: IEEE/IFIP DSN 2009 (2009)Google Scholar
  24. 24.
    Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77(2) (February 1989)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2009

Authors and Affiliations

  • Ignacio Laguna
    • 1
  • Fahad A. Arshad
    • 1
  • David M. Grothe
    • 1
  • Saurabh Bagchi
    • 1
  1. 1.Dependable Computing Systems Lab (DCSL), School of Electrical and Computer EngineeringPurdue UniversityUSA

Personalised recommendations