Multi-lingual Detection of Terrorist Content on the Web

  • Mark Last
  • Alex Markov
  • Abraham Kandel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3917)


Since the web is increasingly used by terrorist organizations for propaganda, disinformation, and other purposes, the ability to automatically detect terrorist-related content in multiple languages can be extremely useful. In this paper we describe a new, classification-based approach to multi-lingual detection of terrorist documents. The proposed approach builds upon the recently developed graph-based web document representation model combined with the popular C4.5 decision-tree classification algorithm. Evaluation is performed on a collection of 648 web documents in Arabic language. The results demonstrate that documents downloaded from several known terrorist sites can be reliably discriminated from the content of Arabic news reports using a simple decision tree.


Machine Translation Arabic Language Document Representation Terrorist Content Simple Decision Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)MATHGoogle Scholar
  2. 2.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  3. 3.
    Aljlayl, M., Frieder, O.: Effective Arabic-English Cross-Language Information Retrieval via Machine-Readable Dictionaries and Machine Translation. In: Tenth International Conference on Information and Knowledge Management (October 2001)Google Scholar
  4. 4.
    Larkey, L.S., Feng, F., Connell, M., Lavrenko, V.: Language-Specific Models in Multilingual Topic Tracking. In: 27th Annual International Conference on Research and Development in Information Retrieval (July 2004)Google Scholar
  5. 5.
    Larson, R., Gey, F., Chen, A.: Harvesting Translingual Vocabulary Mappings for Multilingual Digital Libraries. In: 2nd ACM/IEEE-CS joint conference on Digital libraries (July 2002)Google Scholar
  6. 6.
    Markov, A., Last, M.: A Simple, Structure-Sensitive Approach for Web Document Classification. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 293–298. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Ramakrishna, K., Tan, S.S. (eds.): After Bali, the Threat of Terrorism in Southeast Asia. World Scientific, Singapore (2003)Google Scholar
  8. 8.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys (1999)Google Scholar
  9. 9.
    Maria, N., Silva, M.J.: Theme-based Retrieval of Web news. In: 23rd Annual International ACM SIGIR Conference on Research and Development In Information Retrieval (July 2000)Google Scholar
  10. 10.
    Carreira, R., Crato, J.M., Gonçalves, D., Jorge, J.A.: Evaluating Adaptive User Profiles for News Classification. In: 9th International Conference on Intelligent User Interface (January 2004)Google Scholar
  11. 11.
    McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI–1998 Workshop on Learning for Text Categorization (1998)Google Scholar
  12. 12.
    Reis, D., Golgher, P., Leander, A., Silva, A.: Automatic Web News Extraction Using Tree Edit Distance. In: 13th International Conference on World Wide Web (2004)Google Scholar
  13. 13.
    Amati, G., Crestani, F.: Probabilistic Learning for Selective Dissemination of Information. Information Processing and Management 35(5), 633–654 (1999)CrossRefGoogle Scholar
  14. 14.
    Tauritz, D., Kok, J., Sprinkhuizen-Kuyper, I.: Adaptive Information Filtering Using Evolutionary Computation. Information Sciences 122(2–4), 121–140 (2000)CrossRefMATHGoogle Scholar
  15. 15.
    Dumais, S., Chen, H.: Hierarchical classification of Web content. In: 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (July 2000)Google Scholar
  16. 16.
    Eirinaki, M., Vazirgiannis, M.: Web Mining for Web Personalization. In: ACM Transactions on Internet Technology (TOIT) (February 2003)Google Scholar
  17. 17.
    Mulvenna, M., Anands, S., Buchner, A.: Personalization on the Net Using Web Mining. Communications of the ACM (August 2000)Google Scholar
  18. 18.
    Eirinaki, M., Vazirgiannis, M., Varlamis, I.: Sewep: Using Site Semantics and a Taxonomy to Enhance the Web Personalization Process. In: Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (August 2003)Google Scholar
  19. 19.
    Weiss, S.M., Apte, C., Damerau, F.J., Johnson, D.E., Oles, F.J., Goetz, T., Hampp, T.: Maximizing Text-Mining Performance. IEEE Intelligent Systems 14(4), 63–69 (1999)CrossRefGoogle Scholar
  20. 20.
    Salton, G., Wong, A., Yang, C.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1971)CrossRefMATHGoogle Scholar
  21. 21.
    Tzeras, K., Hartmann, S.: Automatic Indexing Based on Bayesian Inference Networks. In: 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (July 1993)Google Scholar
  22. 22.
    Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval, Technical Report: TR87-881 (1987)Google Scholar
  23. 23.
    Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for Web Content Mining. Series in Machine Perception and Artificial Intelligence, vol. 62. World Scientific, Singapore (2005)MATHGoogle Scholar
  24. 24.
    Schenker, M., Last, H., Bunke, A.: Classification of Web Documents Using Graph Matching. International Journal of Pattern Recognition and Artificial Intelligence, Special Issue on Graph Matching in Computer Vision and Pattern Recognition 18(3), 475–496 (2004)CrossRefGoogle Scholar
  25. 25.
    Kuramochi, M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs, Technical Report TR# 02-26, Dept. of Computer Science and Engineering, University of Minnesota (2002)Google Scholar
  26. 26.
    Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems (March 2002)Google Scholar
  27. 27.
    Yan, X., Han, J.: gSpan: Graph-Based Substructure Pattern Mining. In: IEEE International Conference on Data Mining (ICDM 2002) (December 2002)Google Scholar
  28. 28.
    Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)Google Scholar
  29. 29.
    Quinlan, J.R.: C4.5: Programs for Machine Learning (1993)Google Scholar
  30. 30.
    Ahmed, C.J., David, F., William, O.: UCLIR: a Multilingual Information Retrieval tool. Multilingual Information Access and Natural Language Processing (November 2002)Google Scholar
  31. 31.
    Ripplinger, B.: The Use of NLP Techniques in CLIR. In: Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (September 2000)Google Scholar
  32. 32.
    Maimon, O., Last, M.: Knowledge Discovery and Data Mining – The Info-Fuzzy Network (IFN) Methodology. Massive Computing Series. Kluwer Academic Publishers, Dordrecht (2000)MATHGoogle Scholar
  33. 33.
    Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis. In: SIGIR (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mark Last
    • 1
  • Alex Markov
    • 1
  • Abraham Kandel
    • 2
  1. 1.Department of Information Systems EngineeringBen-Gurion University of the NegevBeer-ShevaIsrael
  2. 2.Department of Computer Science and EngineeringUniversity of South FloridaTampaUSA

Personalised recommendations