Adult Content Filtering through Compression-Based Text Classification

  • Igor Santos
  • Patxi Galán-García
  • Aitor Santamaría-Ibirika
  • Borja Alonso-Isla
  • Iker Alabau-Sarasola
  • Pablo Garcia Bringas
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 189)


Internet is a powerful source of information. However, some of the information that is available in the Internet, cannot be shown to every type of public. For instance, pornography is not desirable to be shown to children. To this end, several algorithms for text filtering have been proposed that employ a Vector Space Model representation of the webpages. Nevertheless, these type of filters can be surpassed using different attacks. In this paper, we present the first adult content filtering tool that employs compression algorithms to represent data that is resilient to these attacks. We show that this approach enhances the results of classic VSM models.


Content filtering text-processing compression-based text classification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gómez Hidalgo, J., Sanz, E., García, F., Rodríguez, M.: Web Content Filtering. Advances in Computers 76, 257–306 (2009)CrossRefGoogle Scholar
  2. 2.
    Duan, L., Cui, G., Gao, W., Zhang, H.: Adult image detection method base-on skin color model and support vector machine. In: Asian Conference on Computer Vision, pp. 797–800 (2002)Google Scholar
  3. 3.
    Zheng, H., Daoudi, M., Jedynak, B.: Blocking adult images based on statistical skin detection. Electronic Letters on Computer Vision and Image Analysis 4(2), 1–14 (2004)Google Scholar
  4. 4.
    Lee, J., Kuo, Y., Chung, P., Chen, E., et al.: Naked image detection based on adaptive and extensible skin color model. Pattern Recognition 40(8), 2261–2270 (2007)MATHCrossRefGoogle Scholar
  5. 5.
    Choi, B., Chung, B., Ryou, J.: Adult Image Detection Using Bayesian Decision Rule Weighted by SVM Probability. In: 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology, pp. 659–662. IEEE (2009)Google Scholar
  6. 6.
  7. 7.
    Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325–330. IEEE (2003)Google Scholar
  8. 8.
    Kim, Y., Nam, T.: An efficient text filter for adult web documents. In: The 8th International Conference on Advanced Communication Technology, ICACT 2006, vol. 1, p. 3. IEEE (2006)Google Scholar
  9. 9.
    Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792–4798. IEEE (2004)Google Scholar
  10. 10.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)MATHCrossRefGoogle Scholar
  11. 11.
    Wittel, G., Wu, S.: On attacking statistical spam filters. In: Proceedings of the 1st Conference on Email and Anti-Spam, CEAS (2004)Google Scholar
  12. 12.
    Cormack, G.V., Horspool, R.N.S.: Data compression using dynamic markov modelling. The Computer Journal 30(6), 541 (1987)MathSciNetGoogle Scholar
  13. 13.
    Bratko, A., Filipič, B., Cormack, G.V., Lynam, T.R., Zupan, B.: Spam filtering using statistical data compression models. The Journal of Machine Learning Research 7, 2673–2698 (2006)MATHGoogle Scholar
  14. 14.
    Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)MATHCrossRefGoogle Scholar
  15. 15.
    Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)CrossRefGoogle Scholar
  16. 16.
    Wilbur, W.J., Sirotkin, K.: The automatic identification of stop words. Journal of Information Science 18(1), 45–55 (1992)CrossRefGoogle Scholar
  17. 17.
    Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  18. 18.
    Lovins, J.B.: Development of a Stemming Algorithm.. Mechanical Translation and Computational Linguistics 11(1), 22–31 (1992)Google Scholar
  19. 19.
    Garner, S.: Weka: The Waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  20. 20.
    Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 7th Conference on Uncertainty in Artificial Intelligence (1991)Google Scholar
  21. 21.
    Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. Machine Learning, 131–163 (1997)Google Scholar
  22. 22.
    Lewis, D.D.: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–18. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  23. 23.
    Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997)MATHCrossRefGoogle Scholar
  24. 24.
    Ide, N., Véronis, J.: Introduction to the special issue on word sense disambiguation: the state of the art. Computational linguistics 24(1), 2–40 (1998)Google Scholar
  25. 25.
    Navigli, R.: Word sense disambiguation: a survey. ACM Computing Surveys (CSUR) 41(2), 10 (2009)CrossRefGoogle Scholar
  26. 26.
    Cano, J.R., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Applied Soft Computing Journal 6(3), 323–332 (2006)CrossRefGoogle Scholar
  27. 27.
    Czarnowski, I., Jedrzejowicz, P.: Instance reduction approach to machine learning and multi-database mining. In: Proceedings of the Scientific Session Organized During XXI Fall Meeting of the Polish Information Processing Society, Informatica, pp. 60–71. ANNALES Universitatis Mariae Curie-Skłodowska, Lublin (2006)Google Scholar
  28. 28.
    Pyle, D.: Data preparation for data mining. Morgan Kaufmann (1999)Google Scholar
  29. 29.
    Tsang, E., Yeung, D., Wang, X.: OFFSS: optimal fuzzy-valued feature subset selection. IEEE Transactions on Fuzzy Systems 11(2), 202–213 (2003)CrossRefGoogle Scholar
  30. 30.
    Torkkola, K.: Feature extraction by non parametric mutual information maximization. The Journal of Machine Learning Research 3, 1415–1438 (2003)MathSciNetMATHGoogle Scholar
  31. 31.
    Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)MathSciNetMATHCrossRefGoogle Scholar
  32. 32.
    Liu, H., Motoda, H.: Instance selection and construction for data mining. Kluwer Academic Pub. (2001)Google Scholar
  33. 33.
    Liu, H., Motoda, H.: Computational methods of feature selection. Chapman & Hall/CRC (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Igor Santos
    • 1
  • Patxi Galán-García
    • 1
  • Aitor Santamaría-Ibirika
    • 1
  • Borja Alonso-Isla
    • 1
  • Iker Alabau-Sarasola
    • 1
  • Pablo Garcia Bringas
    • 1
  1. 1.S3LabUniversity of DeustoBilbaoSpain

Personalised recommendations