Topic-Specific YouTube Crawling to Detect Online Radicalization

  • Swati Agarwal
  • Ashish Sureka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8999)


Online video sharing platforms such as YouTube contains several videos and users promoting hate and extremism. Due to low barrier to publication and anonymity, YouTube is misused as a platform by some users and communities to post negative videos disseminating hatred against a particular religion, country or person. We formulate the problem of identification of such malicious videos as a search problem and present a focused-crawler based approach consisting of various components performing several tasks: search strategy or algorithm, node similarity computation metric, learning from exemplary profiles serving as training data, stopping criterion, node classifier and queue manager. We implement two versions of the focused crawler: best-first search and shark search. We conduct a series of experiments by varying the seed, number of n-grams in the language model based comparer, similarity threshold for the classifier and present the results of the experiments using standard Information Retrieval metrics such as precision, recall and F-measure. The accuracy of the proposed solution on the sample dataset is 69% and 74% for the best-first and shark search respectively. We perform characterization study (by manual and visual inspection) of the anti-India hate and extremism promoting videos retrieved by the focused crawler based on terms present in the title of the videos, YouTube category, average length of videos, content focus and target audience. We present the result of applying Social Network Analysis based measures to extract communities and identify core and influential users.


Mining User Generated Content Social Media Analytics Information Retrieval Focused Crawler Social Network Analysis Hate and Extremism Detection Video Sharing Website Online Radicalization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, S., Sureka, A.: A focused crawler for mining hate and extremism promoting videos on youtube. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT 2014, pp. 294–296. ACM, New York (2014), CrossRefGoogle Scholar
  2. 2.
    Agrawal, S., Sureka, A.: Copyright infringement detection of music videos on youtube by mining video and uploader meta-data. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 48–67. Springer, Heidelberg (2013), CrossRefGoogle Scholar
  3. 3.
    Chaudhary, V., Sureka, A.: Contextual feature based one-class classifier approach for detecting video response spam on youtube. In: 2013 Eleventh Annual International Conference on Privacy, Security and Trust (PST), pp. 195–204 (2013)Google Scholar
  4. 4.
    Chen, H.: Extremist youtube videos. In: Dark Web. Integrated Series in Information Systems, vol. 30, pp. 295–318. Springer, New York (2012),
  5. 5.
    Chen, H., Denning, D., Roberts, N., Larson, C.A., Yu, X., Huang, C.-N.: Chapter 1 - revealing the hidden world of the dark web: Social media forums and videos. In: Yang, C., Mao, W., Zheng, X., Wang, H. (eds.) Intelligent Systems for Security Informatics, p. 1. Academic Press, Boston (2013), CrossRefGoogle Scholar
  6. 6.
    Chen, H., Denning, D., Roberts, N., Larson, C.A., Yu, X., Huang, C.: The dark web forum portal: From multi-lingual to video. In: ISI, pp. 7–14. IEEE (2011),
  7. 7.
    Conway, M., McInerney, L.: Jihadi video and auto-radicalisation: Evidence from an exploratory youtube study. In: Ortiz-Arroyo, D., Larsen, H.L., Zeng, D.D., Hicks, D., Wagner, G. (eds.) EuroIsI 2008. LNCS, vol. 5376, pp. 108–118. Springer, Heidelberg (2008), CrossRefGoogle Scholar
  8. 8.
    Fu, T., Chen, H.: Knowledge discovery and text miningGoogle Scholar
  9. 9.
    Goodwin, M.: The Roots of Extremism: The English Defence League and the Counter-Jihad Callenge. Chatham House (2013)Google Scholar
  10. 10.
    Hersovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalhaim, M., Ur, S.: The shark-search algorithm. an application: tailored web site mapping. Computer Networks and ISDN Systems 30(1), 317–326 (1998)CrossRefGoogle Scholar
  11. 11.
    McNamee, L.G., Peterson, B.L., Peña, J.: A call to educate, participate, invoke and indict: Understanding the communication of online hate groups. Communication Monographs 77(2), 257–280 (2010)CrossRefGoogle Scholar
  12. 12.
    Peng, F., Schuurmans, D., Wang, S.: Language and task independent text categorization with simple language models. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 110–117. Association for Computational Linguistics (2003)Google Scholar
  13. 13.
    Rawat, S., Patil, D.R.: Efficient focused crawling based on best first search. In: 2013 IEEE 3rd International Advance Computing Conference (IACC), pp. 908–911 (February 2013)Google Scholar
  14. 14.
    Reid, E., Chen, H.: Internet-savvy us and middle eastern extremist groups. Mobilization: An International Quarterly 12(2), 177–192 (2007)Google Scholar
  15. 15.
    Salem, A., Reid, E., Chen, H.: Content analysis of jihadi extremist groups’ videos. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, F.-Y. (eds.) ISI 2006. LNCS, vol. 3975, pp. 615–620. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Sureka, A.: Mining user comment activity for detecting forum spammers in youtube. arXiv preprint arXiv:1103.5044 (2011)Google Scholar
  17. 17.
    Sureka, A., Kumaraguru, P., Goyal, A., Chhabra, S.: Mining youTube to discover extremist videos, users and hidden communities. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 13–24. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Ting, I.-H., Chi, H.-M., Wu, J.-S., Wang, S.-L.: An approach for hate groups detection in facebook. In: Uden, L., Wang, L.S.L., Hong, T.-P., Yang, H.-C., Ting, I.-H. (eds.) The 3rd International Workshop on Intelligent Data Analysis and Management. Springer Proceedings in Complexity, pp. 101–106. Springer, Netherlands (2013),
  19. 19.
    Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A., Edwards, L.: Detection of harassment on web 2.0. In: Proceedings of the Content Analysis in the WEB, vol. 2 (2009)Google Scholar
  20. 20.
    Zhou, Y., Reid, E., Qin, J., Chen, H., Lai, G.: Us domestic extremist groups on the web: link and content analysis. IEEE Intelligent Systems 20(5), 44–51 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Swati Agarwal
    • 1
  • Ashish Sureka
    • 2
  1. 1.Indraprastha Institute of Information Technology-Delhi (IIIT-D)India
  2. 2.Software Analytics Research Lab (SARL)India

Personalised recommendations