Abstract
The paper presents a method for pornography detection in the web pages based on natural language processing. The described classification method uses feature set of single words and groups of words. Syntax analysis is performed to extract collocations. A modification of TF-IDF is used to weight terms. An evaluation and comparison of quality and performance of classification are given.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
TopTenReviews: Internet pornography statistics (March 2013), http://internet-filter-review.toptenreviews.com/internet-pornography-statistics.html
Polpinij, J., Chotthanom, A., Sibunruang, C., Chamchong, R., Puangpronpitag, S.: Content-based text classifiers for pornographic web filtering. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2006, vol. 2, pp. 1481–1485 (2006)
Polpinij, J., Sibunruang, C., Paungpronpitag, S., Chamchong, R., Chotthanom, A.: A web pornography patrol system by content-based analysis: In particular text and image. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2008, pp. 500–505 (2008)
Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792–4798 (2004)
Lee, P., Hui, S., Fong, A.: A structural and content-based analysis for web filtering. Internet Research 13(1), 27–37 (2003)
Hammami, M., Chahir, Y., Chen, L.: Webguard: Web based adult content detection and filtering system. In: Proceedings of the IEEE/WIC International Conference on Web Intelligence, WI 2003, pp. 574–578 (2003)
Hu, W., Wu, O., Chen, Z., Fu, Z., Maybank, S.: Recognition of pornographic web pages by classifying texts and images. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6), 1019–1034 (2007)
eTesting Labs: U.S. department of justice: Updated web content filtering software comparison. Technical report, eTesting Labs (2001)
Chou, C.-H., Sinha, A.P., Zhao, H.: A text mining approach to internet abuse detection. Information Systems and e-Business Management (2008)
Su, G.Y., Li, J.H., Ma, Y.H., Li, S.H.: Improving the precision of the keyword-matching pornographic text filtering method using a hybrid model. Journal of Zhejiang University Science 5(9), 1106–1113 (2004)
Churcharoenkrung, N., Kim, Y.S., Kang, B.H.: Dynamic web content filtering based on user’s knowledge. In: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC 2005), vol. I, pp. 184–188. IEEE Computer Society, Washington, DC (2005)
Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325–330 (2003)
Mbaykodzhi, A., Dral, A.A., Sochenkov, I.V.: Short text messages classification method. Information Technologies and Computational Systems (3), 93–102 (2012)
Manning, C., Raghavan, P., Shutze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
FreeLing: An open source suite of language analyzers, http://nlp.lsi.upc.edu/freeling/
AOT: Automatic text processing, http://aot.ru/
Osipov, G., Smirnov, I., Tikhomirov, I., Shelmanov, A.: Relational-situational method for intelligent search and analysis of scientific publications. In: Proceedings of the Integrating IR Technologies for Professional Search Workshop, pp. 57–64 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Suvorov, R., Sochenkov, I., Tikhomirov, I. (2013). Method for Pornography Filtering in the WEB Based on Automatic Classification and Natural Language Processing. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-01931-4_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)