Content-Based Methodology for Anomaly Detection on the Web
As became apparent after the tragic events of September 11, 2001, terrorist organizations and other criminal groups are increasingly using the legitimate ways of Internet access to conduct their malicious activities. Such actions cannot be detected by existing intrusion detection systems that are generally aimed at protecting computer systems and networks from some kind of “cyber attacks”. Preparation of an attack against the human society itself can only be detected through analysis of the content accessed by the users. The proposed study aims at developing an innovative methodology for abnormal activity detection, which uses web content as the audit information provided to the detection system. The new behavior-based detection method learns the normal behavior by applying an unsupervised clustering algorithm to the contents of publicly available web pages viewed by a group of similar users. In this paper, we represent page content by the well-known vector space model. The content models of normal behavior are used in real-time to reveal deviation from normal behavior at a specific location on the net. The detection algorithm sensitivity is controlled by a threshold parameter. The method is evaluated by the trade-off between the detection rate (TP) and the false positive rate (FP).
Keywordsinformation retrieval unsupervised clustering user modeling web security anomaly detection activity monitoring
Unable to display preview. Download preview PDF.
- 2.W. Lee, S.J. Stolfo, P. K. Chan, E. Eskin, W. Fan, M. Miller, S. Hershkop, J. Zhang, “Real Time Data Mining-based Intrusion Detection”, Proceedings of DISCEX II, 2001.Google Scholar
- 3.W. Lee, S.J. Stolfo, “A Framework for Constructing Features and Models for Intrusion Detection Systems”, ACM Transactions on Information and System Security, 2000, Vol. 3, No. 4.Google Scholar
- 4.W. Lee, S.J. Stolfo, “Data Mining Approaches for Intrusion Detection”, In Proceedings of the Seventh USENIX Security Symposium, San Antonio, TX, 1998.Google Scholar
- 7.J.S. Balasubramaniyan, J.O. Garcia-Fernandez, D Isacoff, E. Spafford, D. Zamboni, “An architecture for intrusion detection using autonomous agents”, Proceedings 14th Annual Computer Security Applications Conference, IEEE Comput. Soc, Los Alamitos, CA, USA, 1998, xiii+365, pp. 13–24.Google Scholar
- 8.J. Cannady, “Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks”, Proceedings of the 23rd National Information Systems Security Conference, 2000.Google Scholar
- 9.J. Cannady, “Neural Networks for Misuse Detection: Initial Results”, Proceedings of the Recent Advances in Intrusion Detection’ 98 Conference, 1998, pp. 31–47.Google Scholar
- 14.J. Cannady, “Applying CMAC-based on-line learning to intrusion detection”, In Proceedings of the International Joint Conference on Neural Networks, Italy, 2000, Vol. 5, pp. 405–410.Google Scholar
- 16.B.C. Rhodes, J.A. Mahaffey, J.D. Cannady, “Multiple Self-Organizing Maps for Intrusion Detection”, 23rd National Information Systems Security Conference, 2000.Google Scholar
- 17.E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S. Stolfo, “A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data”, Data Mining in Security Applications, Kluwer Academic Publishers, 2002.Google Scholar
- 20.T. Fawcett, F. Provost, “Activity Monitoring: Noticing interesting changes in behavior”, Proceedings on the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999.Google Scholar
- 26.D. Hand, H. Mannila, P. Smyth, “Principles of Data Mining”, MIt Press, England, 2001.Google Scholar
- 27.U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, “From Data Mining to Knowledge Discovery in Databases”, AI Magazine, 1996, Vol. 17, No. 3, pp. 37–54.Google Scholar
- 29.A. Schenker, M. Last, H. Bunke, and A. Kandel, “Clustering of Web Documents using a Graph Model”, to appear in “Web Document Analysis: Challenges and Opportunities”, Apostolos Antonacopoulos and Jianying Hu (Editors), World Scientific, 2003.Google Scholar
- 30.G. Salton, Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer (Addison-Wesley, Reading, 1989).Google Scholar
- 32.X. Lu, Document retrieval: a structural approach, Information Processing and Management 26, 2 (1990) 209–218.Google Scholar
- 33.Han, J. and Kamber, M., “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001.Google Scholar
- 34.K. Sequeira and M. Zaki, “ADMIT: Anomaly-based Data Mining for Intrusions”, Proceeding of SIGKDD 02, pp. 386–395, ACM, 2002.Google Scholar
- 36.R. Lemos, “What are the real risks of cyberterrorism?”, ZDNet, August 26, 2002, URL: http://zdnet.com.com/2100-1105-955293.html.
- 37.George Karypis, CLUTO — A Clustering Toolkit, Release 2.0, University of Minnesota, 2002 [http://www-users.cs.umn.edu/~karypis/cluto/download.html].