Advertisement

Method for Pornography Filtering in the WEB Based on Automatic Classification and Natural Language Processing

  • Roman Suvorov
  • Ilya Sochenkov
  • Ilya Tikhomirov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8113)

Abstract

The paper presents a method for pornography detection in the web pages based on natural language processing. The described classification method uses feature set of single words and groups of words. Syntax analysis is performed to extract collocations. A modification of TF-IDF is used to weight terms. An evaluation and comparison of quality and performance of classification are given.

Keywords

text classification dynamic web content filtering pornography detection natural language processing thematic importance characteristic 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    TopTenReviews: Internet pornography statistics (March 2013), http://internet-filter-review.toptenreviews.com/internet-pornography-statistics.html
  2. 2.
    Polpinij, J., Chotthanom, A., Sibunruang, C., Chamchong, R., Puangpronpitag, S.: Content-based text classifiers for pornographic web filtering. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2006, vol. 2, pp. 1481–1485 (2006)Google Scholar
  3. 3.
    Polpinij, J., Sibunruang, C., Paungpronpitag, S., Chamchong, R., Chotthanom, A.: A web pornography patrol system by content-based analysis: In particular text and image. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, pp. 500–505 (2008)Google Scholar
  4. 4.
    Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792–4798 (2004)Google Scholar
  5. 5.
    Lee, P., Hui, S., Fong, A.: A structural and content-based analysis for web filtering. Internet Research 13(1), 27–37 (2003)CrossRefGoogle Scholar
  6. 6.
    Hammami, M., Chahir, Y., Chen, L.: Webguard: Web based adult content detection and filtering system. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence, WI 2003, pp. 574–578 (2003)Google Scholar
  7. 7.
    Hu, W., Wu, O., Chen, Z., Fu, Z., Maybank, S.: Recognition of pornographic web pages by classifying texts and images. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6), 1019–1034 (2007)CrossRefGoogle Scholar
  8. 8.
    eTesting Labs: U.S. department of justice: Updated web content filtering software comparison. Technical report, eTesting Labs (2001)Google Scholar
  9. 9.
    Chou, C.-H., Sinha, A.P., Zhao, H.: A text mining approach to internet abuse detection. Information Systems and e-Business Management (2008)Google Scholar
  10. 10.
    Su, G.Y., Li, J.H., Ma, Y.H., Li, S.H.: Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model. Journal of Zhejiang University Science 5(9), 1106–1113 (2004)CrossRefGoogle Scholar
  11. 11.
    Churcharoenkrung, N., Kim, Y.S., Kang, B.H.: Dynamic web content filtering based on user’s knowledge. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC 2005), vol. I, pp. 184–188. IEEE Computer Society, Washington, DC (2005)Google Scholar
  12. 12.
    Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325–330 (2003)Google Scholar
  13. 13.
    Mbaykodzhi, A., Dral, A.A., Sochenkov, I.V.: Short text messages classification method. Information Technologies and Computational Systems (3), 93–102 (2012)Google Scholar
  14. 14.
    Manning, C., Raghavan, P., Shutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
  15. 15.
    FreeLing: An open source suite of language analyzers, http://nlp.lsi.upc.edu/freeling/
  16. 16.
    AOT: Automatic text processing, http://aot.ru/
  17. 17.
    Osipov, G., Smirnov, I., Tikhomirov, I., Shelmanov, A.: Relational-situational method for intelligent search and analysis of scientific publications. In: Proceedings of the Integrating IR Technologies for Professional Search Workshop, pp. 57–64 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Roman Suvorov
    • 1
  • Ilya Sochenkov
    • 1
  • Ilya Tikhomirov
    • 1
  1. 1.Institute for Systems Analysis of Russian Academy of SciencesMoscowRussia

Personalised recommendations