Skip to main content

Design Consideration for Improved Term Weighting Scheme for Pornographic Web sites

  • Conference paper
  • First Online:
Pattern Analysis, Intelligent Security and the Internet of Things

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 355))

Abstract

Illicit Web content filtering is a content-based analysis technique, applied to censor inappropriate contents on the Internet. Web content filtering can recognize undesirable contents through the application of AI techniques, linguistic analysis, or machine learning to classify Web pages into a set of predefined categories. However, the capacity to distinguish between useful and harmful Web content remains a major research challenge, which usually leads to the problem of under-blocking and over-blocking. Further, the extraction of best term representation for classifier presents a major limitation due to curse of dimensionality, where a feature can have the same term frequency (TF) in two or more categories but has different semantic meanings such as illicit pornography and sex education context also known as ambiguous issues. Besides, the high dimensionality of features on a Web page, even for moderate size, it has made the term representation value for classifier more complex, which affects the performance of classification. Thus, this research proposes a modified term weighting scheme (TWS) for narrative and discrete Web in order to increase the classification performance. Characteristics of pornography Web site were extracted and significant characteristics were identified and mapped against term weighting factors. Initial result revealed that other criteria such as rare feature have potential to be regarded as significant criteria in TWS technique to distinguish high-similarity Web content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Utusan: 152,182 Orang Anak Luar Nikah 2008–2010. Arkib 16/11/2011 (in Malay) (2011)

    Google Scholar 

  2. Maulana, F.A., Abdulmana, S., Alfariti, F.: Collaborative internet content filtering on the internet infrastructure in Malaysia. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering (URKE) (2011)

    Google Scholar 

  3. Akbulut, A., Patlar, F., Bayrak, C., Mendi, E., Hanna, J.: Agent based pornography filtering system. In: 2012 International Symposium on Innovations in Intelligent Systems and Applications (INISTA) (2012)

    Google Scholar 

  4. Markey, P.M.: Online Pornography Seeking Behaviors (2011)

    Google Scholar 

  5. Kahn, B.K., Strong, D.M., Wang, R.Y.: Information quality benchmarks: product and service performance. Commun. ACM 45(4), 184–192 (2002)

    Article  Google Scholar 

  6. Eyono Obono, S.D.: An information and system quality evaluation framework for Tribal portals: the case of selected Tribal portals from cameroon. In: 2010 2nd International Conference on Computer Technology and Development (ICCTD) (2010)

    Google Scholar 

  7. Denram, U.R.: WAO annual statistics 2012. Retrieved on 21 May 2013 (2012)

    Google Scholar 

  8. Maulana, F.A., Abdulmana, S., Alfariti, F., Nong, R.A.: Collaborative internet threat detection on internet infrastructure in Malaysia (2011)

    Google Scholar 

  9. Chen, T.M., Wang, V.: Web filtering and censoring. Computer 43(3), 94–97 (2010)

    Article  MATH  Google Scholar 

  10. Banday, M.T., Shah N.A.: A concise study of web filtering (2010)

    Google Scholar 

  11. Niharika, S., Latha V.S., Lavanya, D.R.: A survey on text categorization (2012)

    Google Scholar 

  12. Lee, Z.S., Maarof, M.A., Selamat, A., Shamsuddin, S.M.: Enhance term weighting algorithm as feature selection technique for illicit web content classification (2010)

    Google Scholar 

  13. Selamat, A., Omatu, S.: Web page feature selection and classification using neural networks. Inf. Sci. 158, 69–88 (2004)

    Article  MathSciNet  Google Scholar 

  14. Wang, D., Zhang, H.: Inverse-category-frequency based supervised term weighting schemes for text categorization (2013)

    Google Scholar 

  15. Hammami, M., Guermazi, R., et al.: Automatic violent content web filtering approach based on the KDD process. Int. J. Web Inf. Syst. 4(4), 441–464 (2008)

    Article  Google Scholar 

  16. Santos, I., Galán-García, P., Santamaría-Ibirika, A., Alonso-Isla, B., Alabau-Sarasola, I., Bringas, P.: Adult content filtering through compression-based text classification. In: Herrero Á., Snášel V., Abraham A. et al. (eds.) International Joint Conference CISIS’12-ICEUTE’12-SOCO’12 Special Sessions, vol. 189, pp. 281–288. Springer, Berlin (2012)

    Google Scholar 

  17. Liu, Y., Loh, H.T., Sun, A.: Imbalanced text classification: a term weighting approach. Expert Syst. Appl. 36(1), 690–701 (2009)

    Article  Google Scholar 

  18. Erenel, Z., Altincay, H.: Nonlinear transformation of term frequencies for term weighting in text categorization. Eng. Appl. Artif. Intell. 25(7), 1505–1514 (2012)

    Article  Google Scholar 

  19. Yang, Y., Pederson, J.O.: Comparative study on feature selection in text categorization (1997)

    Google Scholar 

  20. Largeron, C., Moulin, C., Gery, M.: Entropy based feature selection for text categorization. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 924–928 (2011)

    Google Scholar 

  21. Yan, X., Lin, C.: Term-frequency based feature selection methods for text categorization. In: 2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC) (2010)

    Google Scholar 

  22. Benjamin, C.M.X., et al.: Customized term weighting scheme for document classification (2008)

    Google Scholar 

  23. Lan, M., Sung, S.Y., Low, H.B., Tan, C.L.: A comparative study on term weighting schemes for text categorization (2005)

    Google Scholar 

  24. Deng, Z.H., Tang, S.W., Yang, D.Q., Zhang, M., Li, L.Y., Xie, K.Q.: A comparative study on feature weight in text categorization. In: Yu J.X., Lin X.M., Lu H.J., Zhang Y.C. (eds.) Advanced Web Technologies and Applications, vol. 3007, pp. 588–597. Springer, Berlin (2004)

    Google Scholar 

  25. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 2003 ACM Symposium on Computing, vol. 2003, pp. 784–788. ACM Press, New York (2003)

    Google Scholar 

  26. Lan, M., Tan, C.L., Su, J.: Supervised and traditional term weighting methods for automatic text categorization (2007)

    Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge the Ministry of Higher Education (MOHE) and Universiti Teknologi Malaysia (UTM) for supporting this research. We would also be grateful for the comments from the PARS’10 and reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hafsah Salam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Salam, H., Maarof, M.A., Zainal, A. (2015). Design Consideration for Improved Term Weighting Scheme for Pornographic Web sites. In: Abraham, A., Muda, A., Choo, YH. (eds) Pattern Analysis, Intelligent Security and the Internet of Things. Advances in Intelligent Systems and Computing, vol 355. Springer, Cham. https://doi.org/10.1007/978-3-319-17398-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17398-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17397-9

  • Online ISBN: 978-3-319-17398-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics