Machine Learning

, Volume 51, Issue 1, pp 73–107 | Cite as

An Empirical Study of Two Approaches to Sequence Learning for Anomaly Detection

  • Terran Lane
  • Carla E. Brodley
Article

Abstract

This paper introduces the computer security domain of anomaly detection and formulates it as a machine learning task on temporal sequence data. In this domain, the goal is to develop a model or profile of the normal working state of a system user and to detect anomalous conditions as long-term deviations from the expected behavior patterns. We introduce two approaches to this problem: one employing instance-based learning (IBL) and the other using hidden Markov models (HMMs). Though not suitable for a comprehensive security solution, both approaches achieve anomaly identification performance sufficient for a low-level “focus of attention” detector in a multitier security system. Further, we evaluate model scaling techniques for the two approaches: two clustering techniques for the IBL approach and variation of the number of hidden states for the HMM approach. We find that over both model classes and a wide range of model scales, there is no significant difference in performance at recognizing the profiled user. We take this invariance as evidence that, in this security domain, limited memory models (e.g., fixed-length instances or low-order Markov models) can learn only part of the user identity information in which we're interested and that substantially different models will be necessary if dramatic improvements in user-based anomaly detection are to be achieved.

anomaly detection application instance-based learning hidden Markov models computer security 

References

  1. Aha, D., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6:1,37–66.Google Scholar
  2. Anderson, J. P. (1980). Computer security threat monitoring and surveillance. Technical Report (unnumbered), Fort Washington, PA: James P. Anderson Co.Google Scholar
  3. Angulin, D. (1987). Learning regular sets from queries and counterexamples. Information and Computation, 75, 87–106.Google Scholar
  4. Aslam, J. A., & Rivest, R. L. (1990). Inferring graphs from walks. In Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 359–370). Rochester, NY: ACM Press.Google Scholar
  5. Balasubramaniyan, J. S., Garcia-Fernandez, J. O., Isacoff, D., Spafford, E., & Zamboni, D. (1998). An architecture for intrusion detection using autonomous agents. Technical Report COAST TR 98/05, Wes Lafayette, IN: Purdue University, COAST Laboratory.Google Scholar
  6. Bollobás, B., Das, G., Gunopulos, D., & Mannila, H. (1997). Time-series similarity problems and well-separated geometric sets. In Thirteenth Annual ACM Symposium on Computational Geometry. Rochester, NY: ACM Press.Google Scholar
  7. Burl, M. C., Fayyad, U. M., Perona, P., Smyth, P., & Burl, M. P. (1994). Automating the hunt for volcanoes on Venus. In Proceedings of the 1994 Computer Vision and Pattern Recognition Conference (pp. 302–309). Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  8. Casella, G., & Berger, R. L. (1990). Statistical inference. Pacific Grove, CA: Brooks/Cole.Google Scholar
  9. Chenoweth, T., & Obradovic, Z. (1996). A multi-component nonlinear prediction system for the S&P 500 index Neurocomputing, 10:3, 275–290.Google Scholar
  10. Cis (1999). NetRanger 2.2.1 user guide. Available on Cisco Documentation CD-ROM or at http://www.cisco.com/univercd/cc/td/doc/product/iaabu/netrangr/nr221/nr221ug/index.htm. San Jose, CA: Cisco Systems Inc.Google Scholar
  11. Das, G., Gunopulos, D., & Mannila, H. (1997). Finding similar time series. In Proceedings of The Fourth Inter-national Conference on Knowledge Discovery and Data Mining.Google Scholar
  12. Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  13. Davison, B. D., & Hirsh, H. (1998). Predicting sequences of user actions. In Proceedings of the AAAI-98/ICML-98 Joint Workshop on AI Approaches to Time-Series Analysis (pp. 5–12).Google Scholar
  14. Denning, D. E. (1987). An intrusion-detection model. IEEE Transactions on Software Engineering, 13:2, 222–232.Google Scholar
  15. Domingos, P. (1995). Rule induction and instance-based learning: A unified approach. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Canada (pp. 1226–1232). San Mateo, CA: Morgan Kaufmann.Google Scholar
  16. DuMouchel, W., & Schonlau, M. (1998). Afast computer intrusion detection algorithm based on hypothesis testing of command transition probabilities. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 189–193). AAAI Press.Google Scholar
  17. Fawcett, T. & Provost, F. (1999). Activity monitoring: Noticing interesting changes in behavior. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining.Google Scholar
  18. Fayyad, U. M., Weir, N., & Djorgovski, S. (1993). SKICAT: A machine learning system for automated cataloging of large scale sky surveys. In Proceedings of the Tenth International Conference on Machine Learning (pp. 112–119).Google Scholar
  19. Forrest, S., Hofmeyr, S. A., Somayaji, A., & Longstaff, T. A. (1996). A sense of self for UNIX processes. In Proceedings of 1996 IEEE Symposium on Security and Privacy. Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  20. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119–139.Google Scholar
  21. Fukunaga, K. (1990). Statistical pattern recognition (2nd edn.). San Diego, CA: Academic Press.Google Scholar
  22. Gordon, S. (1996). Current computer virus threats, countermeasures, and strategic solutions. White paper, McAfee Associates.Google Scholar
  23. Greenberg, S. (1988). Using UNIX: Collected traces of 168 users. Technical Report 88/333/45, Alberta, Canada: University of Calgary, Department of Computer Science. Includes tar-format cartridge tape.Google Scholar
  24. Heberlein, L. T., Dias, G. V., Levitt, K. N., Mukherjee, B., Wood, J., & Wolber, D. (1990). A network security monitor. In Proceedings of the 1990 IEEE Symposium on Research in Security and Privacy (pp. 296–304).Google Scholar
  25. ISS (2000). RealSecure product datasheet. Available at http://www.iss.net/customer care/resource center/product lit/. Atlanta, GA: Internet Security Systems.Google Scholar
  26. Juang, B.-H. (1984). On the hidden Markov model and dynamic time warping for speech recognition—A unified view. AT&T Bell Laboratories Technical Journal, 63:7, 1213–1243.Google Scholar
  27. Kumar, S., & Spafford, E. (1994). An application of pattern matching in intrusion detection. Technical Report CSD-TR-94-013, West Lafayette, IN: Purdue University, Computer Science.Google Scholar
  28. Laird, P., & Saul, R. (1994). Discrete sequence prediction and its applications. Machine Learning, 15:1,43–68.Google Scholar
  29. Lane, T. (1998). Filtering techniques for rapid user classification. WS-98-07, Menlo Park, CA: AAAI Press.Google Scholar
  30. Lane, T. (1999). Hidden markov models for human/computer interface modeling. In Proceedings of the IJCAI-99 Workshop on Learning About Users (Sixteenth International Joint Conference on Artificial Intelligence) (pp. 35–44Google Scholar
  31. Lane, T. (2000). Machine Learning Techniques for the Computer Security Domain of Anomaly Detection. Ph.D. thesis, W. Lafayette, IN: Purdue University, Electrical and Computer Engineering.Google Scholar
  32. Lane, T., & Brodley, C. E. (1997a). An application of machine learning to anomaly detection. In Proceedings of the Twentieth National Information Systems Security Conference (Vol 1, pp. 366–380). Gaithersburg, MD: The National Institute of Standards and Technology and the National Computer Security Center, National Institute of Standards and Technology.Google Scholar
  33. Lane, T., & Brodley, C. E. (1997b). Detecting the abnormal: Machine learning in computer security. Technical Report TR-ECE 97-1, W. Lafayette, IN: Purdue University, Electrical and Computer Engineering.Google Scholar
  34. Lane, T., & Brodley, C. E. (1997c). Sequence matching and learning in anomaly detection for computer security. In Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management (Fourteenth National Conference on Artificial Intelligence) (pp. 43–49).Google Scholar
  35. Lane, T., & Brodley, C. E. (1998). Approaches to online learning and concept drift for user identification in computer security. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 259–263). Menlo Park, CA: AAAI Press.Google Scholar
  36. Lane, T., & Brodley, C. E. (1999). Temporal sequence learning and data reduction for anomaly detection. ACM Transactions on Information and System Security, 2:3, 295–331.Google Scholar
  37. Lee, W., Stolfo, S., & Chan, P. (1997). Learning patterns from UNIX process execution traces for intrusion detection. In Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management (Fourteenth National Conference on Artificial Intelligence) (pp. 50–56).Google Scholar
  38. Lee, W., Stolfo, S. J., & Mok, K. W. (1998). Mining audit data to build intrusion detection models. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (pp. 66–72). Menlo Park, CA: AAAI Press.Google Scholar
  39. Lunt, T. F. (1990). IDES: An intelligent system for detecting intruders. In Proceedings of the Symposium: Computer Security, Threat and Countermeasures, Rome, Italy.Google Scholar
  40. Moon, T. K. (1996, November). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 47–59.Google Scholar
  41. Norton, S. W. (1994). Learning to recognize promoter sequences in E. coli by modelling uncertainty in the training data. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA (pp. 657–663).Google Scholar
  42. Oppenheim, A., & Schafer, R. (1989). Discrete-time signal processing. Signal processing. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  43. Orwant, J. (1995). Heterogeneous learning in the Doppelg¨ anger user modeling system. User Modeling and User-Adapted Interaction, 4:2, 107–130.Google Scholar
  44. Pfleeger, C. P. (1997). Security in computing (2nd edn.). Upper Saddle River, NJ: Prentice Hall PTR.Google Scholar
  45. Porras, P., & Neumann, P. (1997). EMERALD: Event monitoring enabling responses to anomalous live distur-bances. In Proceedings of the Twentieth National Information Systems Security Conference (pp. 353–365). </del>Gaithersburg, MD: The National Institute of Standards and Technology and the National Computer Security Center, National Institute of Standards and Technology.Google Scholar
  46. Power, R. (1998). Current and future danger: A CSI primer on computer crime & information warfare. San Francisco, CA: Computer Security Institute.Google Scholar
  47. Provost, F., & Fawcett, T. (1998). Robust classification systems for imprecise environments. In Proceedings of the Fifteenth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.Google Scholar
  48. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  49. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:2.Google Scholar
  50. Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  51. Rivest, R. L., & Schapire, R. E. (1989). Inference of finite automata using homing sequences. In Proceedings of the Twenty First Annual ACM Symposium on Theoretical Computing (pp. 411–420).Google Scholar
  52. Ryan, J., Lin, M.-J., & Miikkulainen, R. (1997). Intrusion detection with neural networks. In Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management (pp. 72–77). AAAI Press.Google Scholar
  53. Salzberg, S. (1991). A nearest hyperrectangular learning method. Machine Learning, 6:3, 251–276.Google Scholar
  54. Salzberg, S. (1995). Locating protein coding regions in human DNA using a decision tree algorithm. Journal of Computational Biology, 2:3, 473–485.Google Scholar
  55. Schaffer, C. (1994). Cross-validation, stacking, and bi-level methods for stacking: Meta-methods for classification learning. In P. Cheeseman, & W. Oldford (Eds.), Selecting models from data: Artificial intelligence and Statistics IV. New York: Springer-Verlag.Google Scholar
  56. Schonlau, M. (2000). Personal communication.Google Scholar
  57. Sheskin, D. J. (1997). Handbook of parametric and nonparametric statistical procedures. Boca Raton, FL: CRC Press.Google Scholar
  58. Shyu, C. R., Kak, A. C., Brodley, C. E., & Broderick, L. S. (1999). Testing for human perceptual categories in a physician-in-the-loop CBIR system for medical imagery. In Proc. IEEE Workshop of Content-Based Access of Image and Video Databases, Fort Collins, CO.Google Scholar
  59. Smaha, S. E. (1988). Haystack: An intrusion detection system. In Proceedings of the Fourth Aerospace Computer Security Applications Conference (pp. 37–44).Google Scholar
  60. Smyth, P. (1994a). Hidden Markov monitoring for fault detection in dynamic systems. Pattern Recognition, 27:1, 149–164.Google Scholar
  61. Smyth, P. (1994b). Markov monitoring with unknown states. IEEE Journal on Selected Areas in Communications, special issue on Intelligent Signal Processing for Communications, 12:9, 1600–1612.Google Scholar
  62. Stoll, C. (1989). The Cuckoo's egg. Pocket Books.Google Scholar
  63. Stough, T., & Brodley, C. E. (1997). Image feature reduction through spoiling: Its application to multiple matched filters for focus of attention. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining.Google Scholar
  64. Theus, M., & Schonlau, M. (1998). Intrusion detection based on structural zeroes. Statistical Computing & Graphics Newsletter, 9:1,12–17.Google Scholar
  65. Wespi, A., Darcier, M., & Debar, H. (1999). Intrusion detection using variable-length audit trail patterns. Technical Report RZ 3164 (# 93210), Zurich, Switzerland: IBM Research.Google Scholar
  66. Wilson, D. R., & Martinez, T. R. (2000). Reduction techniques for exemplar-based learning algorithms. Machine Learning, 38:3, 257–268.Google Scholar
  67. Yoshida, K., & Motoda, H. (1996). Automated user modeling for intelligent interface. International Journal of Human-Computer Interaction, 8:3, 237–258.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Terran Lane
    • 1
  • Carla E. Brodley
    • 2
  1. 1.Department of Computer ScienceUniversity of New MexicoAlbuquerqueUSA
  2. 2.School of Electrical and Computer EngineeringPurdue UniversityWest LafayetteUSA

Personalised recommendations