Content-Based Methodology for Anomaly Detection on the Web

  • Mark Last
  • Bracha Shapira
  • Yuval Elovici
  • Omer Zaafrany
  • Abraham Kandel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2663)


As became apparent after the tragic events of September 11, 2001, terrorist organizations and other criminal groups are increasingly using the legitimate ways of Internet access to conduct their malicious activities. Such actions cannot be detected by existing intrusion detection systems that are generally aimed at protecting computer systems and networks from some kind of “cyber attacks”. Preparation of an attack against the human society itself can only be detected through analysis of the content accessed by the users. The proposed study aims at developing an innovative methodology for abnormal activity detection, which uses web content as the audit information provided to the detection system. The new behavior-based detection method learns the normal behavior by applying an unsupervised clustering algorithm to the contents of publicly available web pages viewed by a group of similar users. In this paper, we represent page content by the well-known vector space model. The content models of normal behavior are used in real-time to reveal deviation from normal behavior at a specific location on the net. The detection algorithm sensitivity is controlled by a threshold parameter. The method is evaluated by the trade-off between the detection rate (TP) and the false positive rate (FP).


information retrieval unsupervised clustering user modeling web security anomaly detection activity monitoring 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    H. Debar, M. Dacier, A. Wespi, “Towards a taxonomy of intrusion-detection systems”, Computer Networks, 1999, Vol. 31, pp. 805–822.CrossRefGoogle Scholar
  2. 2.
    W. Lee, S.J. Stolfo, P. K. Chan, E. Eskin, W. Fan, M. Miller, S. Hershkop, J. Zhang, “Real Time Data Mining-based Intrusion Detection”, Proceedings of DISCEX II, 2001.Google Scholar
  3. 3.
    W. Lee, S.J. Stolfo, “A Framework for Constructing Features and Models for Intrusion Detection Systems”, ACM Transactions on Information and System Security, 2000, Vol. 3, No. 4.Google Scholar
  4. 4.
    W. Lee, S.J. Stolfo, “Data Mining Approaches for Intrusion Detection”, In Proceedings of the Seventh USENIX Security Symposium, San Antonio, TX, 1998.Google Scholar
  5. 5.
    K. Richards, “Network Based Intrusion Detection: A Review of Technologies”, Computers & Security, 1999, Vol. 18, pp. 671–682.CrossRefGoogle Scholar
  6. 6.
    E.H. Spafford, D. Zamboni, “Intrusion detection using autonomous agents”, Computer Networks, 2000, Vol. 4, pp. 547–570.CrossRefGoogle Scholar
  7. 7.
    J.S. Balasubramaniyan, J.O. Garcia-Fernandez, D Isacoff, E. Spafford, D. Zamboni, “An architecture for intrusion detection using autonomous agents”, Proceedings 14th Annual Computer Security Applications Conference, IEEE Comput. Soc, Los Alamitos, CA, USA, 1998, xiii+365, pp. 13–24.Google Scholar
  8. 8.
    J. Cannady, “Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks”, Proceedings of the 23rd National Information Systems Security Conference, 2000.Google Scholar
  9. 9.
    J. Cannady, “Neural Networks for Misuse Detection: Initial Results”, Proceedings of the Recent Advances in Intrusion Detection’ 98 Conference, 1998, pp. 31–47.Google Scholar
  10. 10.
    B. Balajinath, S. Raghavan, “Intrusion detection through learning behavior model”, International Journal Of Computer Communications, 2001, Vol. 24, No. 12, pp. 1202–1212.CrossRefGoogle Scholar
  11. 11.
    G. White, V. Pooch, “Cooperating Security Managers: distribute intrusion detection systems”, Computers & Security, 1996, Vol. 15, No. 5, pp. 441–450.CrossRefGoogle Scholar
  12. 12.
    M. Y. Huang, R.J. Jasper, T.M. Wicks, “A large scale distributed intrusion detection framework based on attack strategy analysis”, Computer Networks, 1999, Vol. 31, pp. 2465–2475.CrossRefGoogle Scholar
  13. 13.
    P. Ning, X.S. Wang, S. Jajodia, “Modeling requests among cooperating intrusion detection systems”, Computer Communications, 2000, Vol. 23, pp. 1702–1715.CrossRefGoogle Scholar
  14. 14.
    J. Cannady, “Applying CMAC-based on-line learning to intrusion detection”, In Proceedings of the International Joint Conference on Neural Networks, Italy, 2000, Vol. 5, pp. 405–410.Google Scholar
  15. 15.
    V. Paxson, “Bro: a system for detecting network intruders in real-time”, Computer Networks, 1999, Vol. 31, pp. 2435–2463.CrossRefGoogle Scholar
  16. 16.
    B.C. Rhodes, J.A. Mahaffey, J.D. Cannady, “Multiple Self-Organizing Maps for Intrusion Detection”, 23rd National Information Systems Security Conference, 2000.Google Scholar
  17. 17.
    E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S. Stolfo, “A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data”, Data Mining in Security Applications, Kluwer Academic Publishers, 2002.Google Scholar
  18. 18.
    R.P. Lippmann, R.K. Cunningham, “Improving intrusion detection performance using keyword selection and neural networks”, Computer Networks, 2000, Vol. 34, pp. 597–603.CrossRefGoogle Scholar
  19. 19.
    J.A. Marin, D. Ragsdale, J. Surdu, “A hybrid approach to the profile creation and intrusion detection”, Proceedings DARPA Information Survivability Conference and Exposition II, IEEE Comput. Soc, CA, USA, 2001, Vol. 1, pp. 69–76.CrossRefGoogle Scholar
  20. 20.
    T. Fawcett, F. Provost, “Activity Monitoring: Noticing interesting changes in behavior”, Proceedings on the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999.Google Scholar
  21. 21.
    Z. Boger, T. Kuflik, P. Shoval, B. Shapira, “Automatic keyword identification by artificial neural networks compared to manual identification by users of filtering systems”, Information Processing and Management, 2001, Vol. 37, pp. 187–198.zbMATHCrossRefGoogle Scholar
  22. 22.
    E. Bloedorn, I. Mani, “Using NLP for Machine Learning of User Profiles”, Intelligent Data Analysis, 1998, Vol. 2, pp. 3–18.CrossRefGoogle Scholar
  23. 23.
    S. Pierrea, C. Kacanb, W. Probstc, “An agent-based approach for integrating user profile into a knowledge management process”, 2000, Knowledge-Based Systems, Vol. 13, pp. 307–314.CrossRefGoogle Scholar
  24. 24.
    B. Shapira, P. Shoval, U. Hanani, “Stereotypes in Information Filtering Systems”, Information Processing & Management, 1997, Vol. 33, No. 3, pp. 273–287.CrossRefGoogle Scholar
  25. 25.
    B. Shapira, P. Shoval, U. Hanani, “Experimentation with an information filtering system that combines cognitive and sociological filtering integrated with user stereotypes”, Decision Support Systems, 1999, Vol. 27, pp. 5–24.CrossRefGoogle Scholar
  26. 26.
    D. Hand, H. Mannila, P. Smyth, “Principles of Data Mining”, MIt Press, England, 2001.Google Scholar
  27. 27.
    U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, “From Data Mining to Knowledge Discovery in Databases”, AI Magazine, 1996, Vol. 17, No. 3, pp. 37–54.Google Scholar
  28. 28.
    A.K. Jain, M.N. Murty, P.J. Flynn, “Data Clustering: A Review”, ACM Computing Surveys, 1999, Vol. 31, No. 3, pp. 264–323.CrossRefGoogle Scholar
  29. 29.
    A. Schenker, M. Last, H. Bunke, and A. Kandel, “Clustering of Web Documents using a Graph Model”, to appear in “Web Document Analysis: Challenges and Opportunities”, Apostolos Antonacopoulos and Jianying Hu (Editors), World Scientific, 2003.Google Scholar
  30. 30.
    G. Salton, Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer (Addison-Wesley, Reading, 1989).Google Scholar
  31. 31.
    S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach (Prentice-Hall, Upper Saddle River, 1995).zbMATHGoogle Scholar
  32. 32.
    X. Lu, Document retrieval: a structural approach, Information Processing and Management 26, 2 (1990) 209–218.Google Scholar
  33. 33.
    Han, J. and Kamber, M., “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2001.Google Scholar
  34. 34.
    K. Sequeira and M. Zaki, “ADMIT: Anomaly-based Data Mining for Intrusions”, Proceeding of SIGKDD 02, pp. 386–395, ACM, 2002.Google Scholar
  35. 35.
    Salton, G., Wong, A., and Yang C.S.A.: Vector Space Model for Automatic Indexing. Communications of the ACM 18, 613–620, 1975zbMATHCrossRefGoogle Scholar
  36. 36.
    R. Lemos, “What are the real risks of cyberterrorism?”, ZDNet, August 26, 2002, URL:
  37. 37.
    George Karypis, CLUTO — A Clustering Toolkit, Release 2.0, University of Minnesota, 2002 [].
  38. 38.
    U. Hanani, B. Shapira and P. Shoval, “Information Filtering: Overview of Issues, Research and Systems”, User Modeling and User-Adapted Interaction (UMUAI), Vol. 11(3), 203–259, 2001.zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Mark Last
    • 1
  • Bracha Shapira
    • 1
  • Yuval Elovici
    • 1
  • Omer Zaafrany
    • 1
  • Abraham Kandel
    • 2
  1. 1.Department of Information Systems EngineeringBen-Gurion University of the NegevBeer-ShevaIsrael
  2. 2.Department of Computer Science and EngineeringUniversity of South FloridaTampaUSA

Personalised recommendations