Abstract
Illicit Web content filtering is a content-based analysis technique, applied to censor inappropriate contents on the Internet. Web content filtering can recognize undesirable contents through the application of AI techniques, linguistic analysis, or machine learning to classify Web pages into a set of predefined categories. However, the capacity to distinguish between useful and harmful Web content remains a major research challenge, which usually leads to the problem of under-blocking and over-blocking. Further, the extraction of best term representation for classifier presents a major limitation due to curse of dimensionality, where a feature can have the same term frequency (TF) in two or more categories but has different semantic meanings such as illicit pornography and sex education context also known as ambiguous issues. Besides, the high dimensionality of features on a Web page, even for moderate size, it has made the term representation value for classifier more complex, which affects the performance of classification. Thus, this research proposes a modified term weighting scheme (TWS) for narrative and discrete Web in order to increase the classification performance. Characteristics of pornography Web site were extracted and significant characteristics were identified and mapped against term weighting factors. Initial result revealed that other criteria such as rare feature have potential to be regarded as significant criteria in TWS technique to distinguish high-similarity Web content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Utusan: 152,182 Orang Anak Luar Nikah 2008–2010. Arkib 16/11/2011 (in Malay) (2011)
Maulana, F.A., Abdulmana, S., Alfariti, F.: Collaborative internet content filtering on the internet infrastructure in Malaysia. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering (URKE) (2011)
Akbulut, A., Patlar, F., Bayrak, C., Mendi, E., Hanna, J.: Agent based pornography filtering system. In: 2012 International Symposium on Innovations in Intelligent Systems and Applications (INISTA) (2012)
Markey, P.M.: Online Pornography Seeking Behaviors (2011)
Kahn, B.K., Strong, D.M., Wang, R.Y.: Information quality benchmarks: product and service performance. Commun. ACM 45(4), 184–192 (2002)
Eyono Obono, S.D.: An information and system quality evaluation framework for Tribal portals: the case of selected Tribal portals from cameroon. In: 2010 2nd International Conference on Computer Technology and Development (ICCTD) (2010)
Denram, U.R.: WAO annual statistics 2012. Retrieved on 21 May 2013 (2012)
Maulana, F.A., Abdulmana, S., Alfariti, F., Nong, R.A.: Collaborative internet threat detection on internet infrastructure in Malaysia (2011)
Chen, T.M., Wang, V.: Web filtering and censoring. Computer 43(3), 94–97 (2010)
Banday, M.T., Shah N.A.: A concise study of web filtering (2010)
Niharika, S., Latha V.S., Lavanya, D.R.: A survey on text categorization (2012)
Lee, Z.S., Maarof, M.A., Selamat, A., Shamsuddin, S.M.: Enhance term weighting algorithm as feature selection technique for illicit web content classification (2010)
Selamat, A., Omatu, S.: Web page feature selection and classification using neural networks. Inf. Sci. 158, 69–88 (2004)
Wang, D., Zhang, H.: Inverse-category-frequency based supervised term weighting schemes for text categorization (2013)
Hammami, M., Guermazi, R., et al.: Automatic violent content web filtering approach based on the KDD process. Int. J. Web Inf. Syst. 4(4), 441–464 (2008)
Santos, I., Galán-García, P., Santamaría-Ibirika, A., Alonso-Isla, B., Alabau-Sarasola, I., Bringas, P.: Adult content filtering through compression-based text classification. In: Herrero Á., Snášel V., Abraham A. et al. (eds.) International Joint Conference CISIS’12-ICEUTE’12-SOCO’12 Special Sessions, vol. 189, pp. 281–288. Springer, Berlin (2012)
Liu, Y., Loh, H.T., Sun, A.: Imbalanced text classification: a term weighting approach. Expert Syst. Appl. 36(1), 690–701 (2009)
Erenel, Z., Altincay, H.: Nonlinear transformation of term frequencies for term weighting in text categorization. Eng. Appl. Artif. Intell. 25(7), 1505–1514 (2012)
Yang, Y., Pederson, J.O.: Comparative study on feature selection in text categorization (1997)
Largeron, C., Moulin, C., Gery, M.: Entropy based feature selection for text categorization. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 924–928 (2011)
Yan, X., Lin, C.: Term-frequency based feature selection methods for text categorization. In: 2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC) (2010)
Benjamin, C.M.X., et al.: Customized term weighting scheme for document classification (2008)
Lan, M., Sung, S.Y., Low, H.B., Tan, C.L.: A comparative study on term weighting schemes for text categorization (2005)
Deng, Z.H., Tang, S.W., Yang, D.Q., Zhang, M., Li, L.Y., Xie, K.Q.: A comparative study on feature weight in text categorization. In: Yu J.X., Lin X.M., Lu H.J., Zhang Y.C. (eds.) Advanced Web Technologies and Applications, vol. 3007, pp. 588–597. Springer, Berlin (2004)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 2003 ACM Symposium on Computing, vol. 2003, pp. 784–788. ACM Press, New York (2003)
Lan, M., Tan, C.L., Su, J.: Supervised and traditional term weighting methods for automatic text categorization (2007)
Acknowledgments
The authors would like to acknowledge the Ministry of Higher Education (MOHE) and Universiti Teknologi Malaysia (UTM) for supporting this research. We would also be grateful for the comments from the PARS’10 and reviewers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Salam, H., Maarof, M.A., Zainal, A. (2015). Design Consideration for Improved Term Weighting Scheme for Pornographic Web sites. In: Abraham, A., Muda, A., Choo, YH. (eds) Pattern Analysis, Intelligent Security and the Internet of Things. Advances in Intelligent Systems and Computing, vol 355. Springer, Cham. https://doi.org/10.1007/978-3-319-17398-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-17398-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17397-9
Online ISBN: 978-3-319-17398-6
eBook Packages: EngineeringEngineering (R0)