Modified Frequency-Based Term Weighting Scheme for Accurate Dark Web Content Classification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8870)


Security informatics and intelligence computation plays a vital role in detecting and classifying terrorism contents in the web. Accurate web content classification using the computational intelligence and security informatics will increase the opportunities of the early detection of the potential terrorist activities. In this paper, we propose a modified frequency-based term weighting scheme for accurate Dark Web content classification. The proposed term weighting scheme is compared to the common techniques used in text classification such as Term Frequency (TF), Term Frequency-Inverse Document Frequency (TF-IFD), and Term Frequency- Relative Frequency (tf.rf), on a dataset selected from Dark Web Portal Forum. The experimental results show that the classification accuracy and other evaluation measures based on the proposed scheme outperforms other term weighting techniques based classification.


Term Frequency Weighting Text Classification Dark Web 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbasi, A., Chen, H.: Affect Intensity Analysis of Dark Web Forums. In: IEEE International Conference on Intelligence and Security Informatics, pp. 282–288. IEEE Press, New York (2007)Google Scholar
  2. 2.
    Zhou, Y., et al.: Exploring the Dark Side of the Web: Collection and Analysis of U.S. Extremist Online Forums. In: 4th IEEE International Conference on Intelligence and Security Informatics, pp. 621–626. IEEE Press, New York (2006)CrossRefGoogle Scholar
  3. 3.
    Choi, D., et al.: Text Analysis for Detecting Terrorism-Related Articles on the Web. Journal of Network and Computer Applications (2013)Google Scholar
  4. 4.
    Fu, T., Abbasi, A., Che, H.: A Focused Crawler for Dark Web Forums. Journal of the American Society for Information Science and Technology 61(6), 1213–1231 (2010)Google Scholar
  5. 5.
    Corbin, J.: Al-Qaeda: In Search of the Terror Network That Threatens the World. Thunder Mouth Press/Nation Books (2003)Google Scholar
  6. 6.
    Abbasi, A., Chen, H.: Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. ACM Transactions on Information Systems 26(2), 7 (2008)CrossRefGoogle Scholar
  7. 7.
    Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist-Group Web Forum Messages. IEEE Intelligent Systems 20(5), 67–75 (2005)CrossRefGoogle Scholar
  8. 8.
    Zheng, R., et al.: A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2006)CrossRefGoogle Scholar
  9. 9.
    Huang, C., Fu, T., Chen, H.: Text-Based Video Content Classification for Online Video-Sharing Sites. J. Am. Soc. Inf. Sci. Technol. 61(5), 891–906 (2010)CrossRefGoogle Scholar
  10. 10.
    Tianjun, F., Chun-Neng, H., Hsinchun, C.: Identification of eExtremist Videos in Online Video Sharing Sites. In: IEEE International Conference on Intelligence and Security Informatics, pp. 179–181. IEEE Press, New York (2009)Google Scholar
  11. 11.
    Choi, D., et al.: Building Knowledge Domain n-gram Model for Mobile Devices. Information 14(11), 3583–3590 (2011)Google Scholar
  12. 12.
    Ran, L., Xianjiu, G.: An Improved Algorithm to Term Weighting in Text Classification. In: International Conference on Multimedia Technology, pp. 1–3. IEEE Press, New York (2010)Google Scholar
  13. 13.
    Greevy, E., Smeaton, A.F.: Classifying Racist Texts using a Support Vector Machine. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Rretrieval, pp. 468–469. ACM, New York (2004)Google Scholar
  14. 14.
    Selamat, A., Omatu, S.: Web Page Feature Selection and Classification using Neural Networks. Inf. Sci. Inf. Comput. Sci. 158(1), 69–88 (2004)MathSciNetGoogle Scholar
  15. 15.
    Crestani, F., et al.: Short Queries, Natural Language and Spoken Document Retrieval: Experiments at Glasgow University. In: Voorhees, E.M., Harman, D.K. (eds.) The Sixth Text Retrieval Conference (TREC-6), pp. 667–686. [NIST Special Publication 500–240], (accessed 15 December 2013)
  16. 16.
    Lan, M., Tan, C.-L., Low, H.-B.: Proposing a New Term Weighting Scheme for Text Categorization. In: 21st National Conference on Artificial Intelligence, pp. 763–768. AAAI Press, Boston (2006)Google Scholar
  17. 17.
    Man, L., et al.: Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. Pattern Analysis and Machine Intelligence 31(4), 721–735 (2009)CrossRefGoogle Scholar
  18. 18.
    Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: 14th International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc. (1997)Google Scholar
  19. 19.
    Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  20. 20.
    Chiang, D.-A., et al.: The Chinese Text Categorization System with Association Rule and Category Priority. Expert Systems with Applications 35(1-2), 102–110 (2008)CrossRefGoogle Scholar
  21. 21.
    Sanderson, M., Ruthven, I.: Report on the Glasgow IR group (glair4) submission. In: The Fifth Text REtrieval Conference (TREC-5), Gaithersburg, Maryland, pp. 517–520 (1996)Google Scholar
  22. 22.
    Anwar, T., Abulaish, M.: Identifying Cliques in Dark Web Forums - An Agglomerative Clustering Approach. In: 10th IEEE International Conference on Intelligence and Security Informatics, pp. 171–173. IEEE Press, New York (2012)Google Scholar
  23. 23.
    Rios, S.A., Munoz, R.: Dark Web Portal Overlapping Community Detection based on Topic Models. In: ACM SIGKDD Workshop on Intelligence and Security Informatics, pp. 1–7. ACM, New York (2012)Google Scholar
  24. 24.
    Yang, C.C., Tang, X., Gong, X.: Identifying Dark Web Clusters with Temporal Coherence Analysis. In: IEEE International Conference on Intelligence and Security Informatics, pp. 167–172. IEEE Press, New York (2011)Google Scholar
  25. 25.
    L’Huillier, G., et al.: Topic-based Social Network Analysis for Virtual Communities of Interests in the Dark Web. In: ACM SIGKDD Workshop on Intelligence and Security Informatics, pp. 66–73. ACM, New York (2010)Google Scholar
  26. 26.
    Yang, C.C., Tang, X., Thuraisingham, B.M.: An Analysis of User Influence Ranking Algorithms on Dark Web Forums. In: ACM SIGKDD Workshop on Intelligence and Security Informatic, pp. 1–7. ACM, New York (2010)Google Scholar
  27. 27.
    Kramer, S.: Anomaly Detection in Extremist Web Forums using ADynamical Systems Approach. In: ACM SIGKDD Workshop on Intelligence and Security Informatics, pp. 1–10. ACM, New York (2010)Google Scholar
  28. 28.
    Sabbah, T., Selamat, A.: Revealing Terrorism Contents form Web Page Using Frequency Weighting Techniques. In: The International Conference on Artificial Life and Robotics (2014)Google Scholar
  29. 29.
    Aknine, S., Slodzian, A., Quenum, J.G.: Web personalisation for users protection: A multi-agent method. In: Mobasher, B., Anand, S.S. (eds.) ITWP 2003. LNCS (LNAI), vol. 3169, pp. 306–323. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Saad, M.K., Ashour, W.: OSAC.: Open Source Arabic Corpora. In: 6th International Symposium on Electrical and Electronics Engineering and Computer Science, Cyprus, pp. 118–123 (2010)Google Scholar
  31. 31.
    Chen, H.: Exploring extremism and terrorism on the web: The dark web project. In: Yang, C.C., et al. (eds.) PAISI 2007. LNCS, vol. 4430, pp. 1–20. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  32. 32.
    Lee, L., et al.: An Enhanced Support Vector Machine Classification Framework by Using Euclidean Distance Function for Text Document Categorization. Applied Intelligence 37(1), 80–99 (2012)CrossRefGoogle Scholar
  33. 33.
    Chisholm, E., Kolda, T.G.: New Term Weighting Formulas for the Vector Space Method in Information Retrieval. Computer Science and Mathematics Division, Oak Ridge National Laboratory (1999)Google Scholar
  34. 34.
    Last, M., Markov, A., Kandel, A.: Multi-lingual detection of terrorist content on the web. In: Chen, H., Wang, F.-Y., Yang, C.C., Zeng, D., Chau, M., Chang, K. (eds.) WISI 2006. LNCS, vol. 3917, pp. 16–30. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  35. 35.
    Gohary, A.F.E., et al.: A Computational Approach for Analyzing and Detecting Emotions in Arabic Text. International Journal of Engineering Research and Applications (IJERA) 3(3), 100–107 (2013)Google Scholar
  36. 36.
    Ceri, S., et al.: An Introduction to Information Retrieval. In: Web Information Retrieval, pp. 3–11. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  37. 37.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar
  38. 38.
    Zimbra, D. and H. Chen.: Scalable Sentiment Classification Across Multiple Dark Web Forums. In: 10th IEEE International Conference on Intelligence and Security Informatics, PP. 78-83. IEEE Computer Society (2012) Google Scholar
  39. 39.
    Xianshan, Z., Guangzhu, Y.: Finding Criminal Suspects by Improving the Accuracy of Similarity Measurement. In: 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1145–1149. IEEE Press, New York (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Faculty of ComputingUniversiti Teknologi Malaysia (UTM)SkudaiMalaysia

Personalised recommendations