Leveraging One-Class SVM and Semantic Analysis to Detect Anomalous Content

  • Ozgur Yilmazel
  • Svetlana Symonenko
  • Niranjan Balasubramanian
  • Elizabeth D. Liddy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3495)


Experiments were conducted to test several hypotheses on methods for improving document classification for the malicious insider threat problem within the Intelligence Community. Bag-of-words (BOW) representations of documents were compared to Natural Language Processing (NLP) based representations in both the typical and one-class classification problems using the Support Vector Machine algorithm. Results show that the NLP features significantly improved classifier performance over the BOW approach both in terms of precision and recall, while using many fewer features. The one-class algorithm using NLP features demonstrated robustness when tested on new domains.


Support Vector Machine Natural Language Processing Support Vector Machine Algorithm Intelligence Community Document Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Symonenko, S., Liddy, E.D., Yilmazel, O., DelZoppo, R., Brown, E., Downey, M.: Semantic Analysis for Monitoring Insider Threats. In: Chen, H., Moore, R., Zeng, D.D., Leavitt, J. (eds.) ISI 2004. LNCS, vol. 3073, pp. 492–500. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Yilmazel, O., Symonenko, S., Liddy, E.D., Balasubramanian, N.: Improved Document Representation for Classification Tasks For The Intelligence Community. In: AAAI 2005 Spring Symposium (2005) (forthcoming)Google Scholar
  3. 3.
    DelZoppo, R., Brown, E., Downey, M., Liddy, E.D., Symonenko, S., Park, J.S., Ho, S.M., D’Eredita, M., Natarajan, A.: A Multi-Disciplinary Approach for Countering Insider Threats. In: SKM Workshop (2004)Google Scholar
  4. 4.
    Anderson, J.: Computer Security Threat Monitoring and Surveillance (1980)Google Scholar
  5. 5.
    Lawrence, R.H., Bauer, R.K.: AINT misbehaving: A taxonomy of anti-intrusion techniques (2000)Google Scholar
  6. 6.
    Sreenath, D.V., Grosky, W.I., Fotouhi, F.: Emergent Semantics from Users’ Browsing Paths. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 355–357. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Burgoon, J., Blair, J., Qin, T., Nunamaker Jr., J.: Detecting Deception Through Linguistic Analysis. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 91–101. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Zhou, L., Burgoon, J.K., Twitchell, D.P.: A Longitudinal Analysis of Language Behavior of Deception in E-mail. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 102–110. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Twitchell, D.P., Nunamaker Jr., J.F., Burgoon, J.K.: Using Speech Act Profiling for Deception Detection. In: Chen, H., Moore, R., Zeng, D.D., Leavitt, J. (eds.) ISI 2004. LNCS, vol. 3073, pp. 403–410. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Heller, K.A., Svore, K.M., Keromytis, A.D., Stolfo, S.J.: One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses. In: DMSEC 2003 (2003)Google Scholar
  11. 11.
    Yu, H., Zhai, C., Han, J.: Text Classification from Positive and Unlabeled Documents. In: CIKM 2003 (2003)Google Scholar
  12. 12.
    Manevitz, L.M., Yousef, M.: One-class SVMs for Document Classification. The Journal of Machine Learning Research 2, 139–154 (2002)zbMATHCrossRefGoogle Scholar
  13. 13.
    Schneider, K.-M.: Learning to Filter Junk E-Mail from Positive and Unlabeled Examples (2004)Google Scholar
  14. 14.
    Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the Support of a High-Dimensional Distribution (1999)Google Scholar
  15. 15.
    Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building Text Classifiers Using Positive and Unlabeled Examples. In: ICDM 2003 (2003)Google Scholar
  16. 16.
    Dumais, S., John, P., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In: Seventh International Conference on Information and Knowledge Management (1998)Google Scholar
  17. 17.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Liddy, E.D.: Natural Language Processing, 2nd edn. Encyclopedia of Library and Information Science (2003)Google Scholar
  19. 19.
    Center for Natural Language Processing (CNLP),
  20. 20.
    Center for Nonproliferation Studies (CNS),
  21. 21.
    Sundheim, B., Irie, R.: Gazetteer Exploitation for Question Answering: Project Summary (2003)Google Scholar
  22. 22.
    Yang, Y., Liu, X.: A Re-Examination of Text Categorization Methods. In: SIGIR 1999 (1999)Google Scholar
  23. 23.
    Joachims, T.: Learning to Classify Text using Support Vector Machines (2002)Google Scholar
  24. 24.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001)Google Scholar
  25. 25.
    van Rijsbergen, C.J.: Information Retrieval (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ozgur Yilmazel
    • 1
  • Svetlana Symonenko
    • 1
  • Niranjan Balasubramanian
    • 1
  • Elizabeth D. Liddy
    • 1
  1. 1.Center for Natural Language Processing, School of Information StudiesSyracuse UniversitySyracuse

Personalised recommendations