Towards Detection of Child Sexual Abuse Media: Categorization of the Associated Filenames

  • Alexander Panchenko
  • Richard Beaufort
  • Hubert Naets
  • Cédrick Fairon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7814)


This paper approaches the problem of automatic pedophile content identification. We present a system for filename categorization, which is trained to identify suspicious files on P2P networks. In our initial experiments, we used regular pornography data as a substitution of child pornography. Our system separates filenames of pornographic media from the others with an accuracy that reaches 91–97%.


short text categorization P2P networks child pornography 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ulges, A., Stahl, A.: Automatic detection of child pornography using color visual words. In: 2011 IEEE International Conference on Multimedia and Expo. (ICME), pp. 1–6. IEEE (2011)Google Scholar
  2. 2.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)CrossRefGoogle Scholar
  3. 3.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceeding of ACM SIGIR, pp. 841–842. ACM (2010)Google Scholar
  4. 4.
    Pendar, N.: Toward spotting the pedophile telling victim from predator in text chats. In: International Conference on Semantic Computing, ICSC 2007, pp. 235–241. IEEE (2007)Google Scholar
  5. 5.
    McGhee, I., Bayzick, J., Kontostathis, A., Edwards, L., McBride, A., Jakubowski, E.: Learning to identify internet sexual predation. International Journal of Electronic Commerce 15(3), 103–122 (2011)CrossRefGoogle Scholar
  6. 6.
    Bogdanova, D., Petersburg, S., Rosso, P., Solorio, T.: On the impact of sentiment and emotion based features in detecting online sexual predators. In: WASSA 2012, pp. 110–118 (2012)Google Scholar
  7. 7.
    Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 37–44. ACM (2011)Google Scholar
  8. 8.
    Peersman, C., Vaassen, F., Van Asch, V., Daelemans, W.: Conversation level constraints on pedophile detection in chat rooms. In: PAN 2012 (2012)Google Scholar
  9. 9.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK, vol. 12, pp. 44–49 (1994)Google Scholar
  10. 10.
    Panchenko, A., Morozova, O., Naets, H.: A semantic similarity measure based on lexico-syntactic patterns. In: Proceedings of KONVENS 2012, pp. 174–178 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Alexander Panchenko
    • 1
  • Richard Beaufort
    • 1
  • Hubert Naets
    • 1
  • Cédrick Fairon
    • 1
  1. 1.Université catholique de LouvainLouvain-la-NeuveBelgium

Personalised recommendations