Advertisement

Wireless Personal Communications

, Volume 93, Issue 2, pp 503–522 | Cite as

An Efficient Multiclass Classifier Using On-Page Positive Personality Features for Web Page Classification for the Next Generation Wireless Communication Networks

  • Vinod Kumar Bhalla
  • Neeraj Kumar
Article
  • 216 Downloads

Abstract

Over the years, wireless communication networks have been widely used by a large community of users in wide variety of applications such as intelligent transportation systems, energy management, safety, and security etc. But, during this era, due to large number of user’s request, there may be a performance bottleneck in some part of the network with respect to various QoS parameters such as congestion and network delay. Hence, there is a requirement of an efficient classification technique to reduce congestion in the network so that throughput of various applications can be increased. Classification helps in searching, sorting, retrieval, and querying of a document for the wireless networks. World Wide Web (WWW) contains huge repository of information in the form of web pages. However, size of Internet is growing day-by-day. The huge repository of information poses challenge to collect and process the relevant related information of a particular domain. So, traditional text classification techniques are difficult to apply on the rapidly growing web-based contents. Hence, novel approaches and techniques need to be devised to reduce the manual efforts in web page classification. Keeping focus on these points, this paper proposes a novel approach for multiclass classifier based on unique personality features of the web page of particular domain category for the next generation wireless networks. Personality features are collected and assigned weights in the proposed scheme. Then, the proposed classifier is trained based on these special features. Results obtained depict that proposed classifier successfully classified news domain pages, education, resume, online shopping, and research web pages from large database repository. Accuracy of the proposed classifier is found to be satisfactory from a large data set of different categories. Also, there is a 10–15 % overall performance gain using the proposed scheme in comparison to the other existing schemes.

Keywords

Multiclassifier Webpage classification Accuracy  Internet of Things Classifier 

Notes

Acknowledgments

We are thankful to all the anonymous reviewers for providing valuable comments and suggestions which improved the overall content, quality, and presentation of the paper.

References

  1. 1.
    Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.zbMATHGoogle Scholar
  2. 2.
    Stuckenschmidt, H., Hartmann, J., & Van Harmelen, F. (2002). Learning structural classification rules for web-page categorization. In FLAIRS conference (pp. 440–444).Google Scholar
  3. 3.
    Kwon, O. W., & Lee, J. H. (2003). Text categorization based on k-nearest neighbor approach for web site classification. Information Processing & Management, 39(1), 25–44.CrossRefzbMATHGoogle Scholar
  4. 4.
    Denoyer, L., Zaragoza, H., & Gallinari, P. (2001, March). HMM-based passage models for document classification and ranking. In Proceedings of ECIR-01, 23rd European colloquium on information retrieval research seattle (pp. 126–135). WA, USA.Google Scholar
  5. 5.
    Selamat, A., & Omatu, S. (2004). Web page feature selection and classification using neural networks. Information Sciences, 158, 69–88.MathSciNetCrossRefGoogle Scholar
  6. 6.
    Tan, S. (2005). Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 28(4), 667–671.CrossRefGoogle Scholar
  7. 7.
    Sun, A., Lim, E. P., & Ng, W. K. (2002, November). Web classification using support vector machine. In Proceedings of the 4th international workshop on Web information and data management (pp. 96–99).Google Scholar
  8. 8.
    Zhang, M. L., Pea, J. M., & Robles, V. (2009). Feature selection for multi-label naive Bayes classification. Information Sciences, 179(19), 3218–3229.CrossRefzbMATHGoogle Scholar
  9. 9.
    Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler (Ed.), 5th annual ACM workshop on COLT (pp. 144–152). Pittsburgh, PA.Google Scholar
  10. 10.
    Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRefGoogle Scholar
  11. 11.
    Liang, J. Z. (2004). SVM multi-classifier and web document classification. In Proceedings of international conference on machine learning and cybernetics, 2004 (Vol. 3, pp. 1347–1351). Shanghai.Google Scholar
  12. 12.
    Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of 14th international conference on machine learning ICML-97 (pp. 170–178). San Francisco, Nashville, USA.Google Scholar
  13. 13.
    Dumais, S., & Chen, H. (2000, July). Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 256–263). New York, USA.Google Scholar
  14. 14.
    Dumais, S., & Chen, H. (2000, July). Hierarchical classification of web content. In N. J. Belkin, P. Ingwersen, & M.-K. Leong (Eds.), Proceedings of SIGIR-00 (pp. 256–263). ACM.Google Scholar
  15. 15.
    Pietramala, A., Policicchio, V. L., Rullo, P., & Sidhu, I. (2008). A genetic algorithm for text classification rule induction. In Proceedings of European conference, ECML PKDD 2008, part II (pp. 188–203). Antwerp, Belgium.Google Scholar
  16. 16.
    Bai, R., Wang, X., & Liao, J. (2007). Combination of rough sets and genetic algorithms for text classification. In Proceedings of second international workshop, AIS-ADM 2007 (pp. 256–268). St. Petersburg, Russia.Google Scholar
  17. 17.
    Liang, J. Z. (2003). Chinese web page classification based on self-organizing mapping neural networks. In Proceedings fifth international conference on computational intelligence and multimedia applications, ICCIMA 2003 (pp. 96–101). Wan, China.Google Scholar
  18. 18.
    Holden, N., & Freitas, A. A. (2004, January). Web page classification with an ant colony algorithm. In Proceedings of 8th international conference (pp. 1092–1102). Birmingham, UK.Google Scholar
  19. 19.
    Benbrahim, H., & Bramer, M. (2004, October). An empirical study for hypertext categorization. In IEEE international conference on systems, man and cybernetics (pp. 5952–5957).Google Scholar
  20. 20.
    Sun, A., Lim, E. P., & Ng, W. K. (2002, November). Web classification using support vector machine. In Proceedings of the 4th international workshop on Web information and data management (pp. 96–99).Google Scholar
  21. 21.
    Lim, C. S., Lee, K. J., & Kim, G. C. (2005). Multiple sets of features for automatic genre classification of web documents. Information Processing & Management, 41(5), 1263–1276.CrossRefGoogle Scholar
  22. 22.
    Attardi, G., Gulli, A., & Sebastiani, F. (1999). Automatic Web page categorization by link and context analysis. In Proceedings of THAI-99, European symposium on telematics, hypermedia and artificial intelligence (pp. 105–119).Google Scholar
  23. 23.
    Riboni, D. (2002). Feature selection for web page classification. In Proceedings workshop, pp. 473–478.Google Scholar
  24. 24.
    Quek, C. Y., & Mitchell, T. (1997). Classification of world wide web documents. Master’s thesis, School of Computer Science Carnegie Mellon University.Google Scholar
  25. 25.
    Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2–3), 219–241.CrossRefGoogle Scholar
  26. 26.
    Hodgson, J. (2001). Do HTML tags flag semantic content. Internet Computing, 5(1), 20–25.CrossRefGoogle Scholar
  27. 27.
    Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., & Gonalves, M. A. (2003, November). Combining link-based and content-based methods for web document classification. In Proceedings of the twelfth international conference on Information and knowledge management (pp. 394–401).Google Scholar
  28. 28.
    Frnkranz, J. (1999). Exploiting structural information for text classification on the WWW. In IDA '99 proceedings of the third international symposium on advances in intelligent data analysis (pp. 487–497). London, UK: Springer.Google Scholar
  29. 29.
    Internet source. http://www.dmoz.org/ Open Directory Project (ODP).
  30. 30.
    Internet source yahoo! Directory.Google Scholar
  31. 31.
    Aliakbary, S., Abolhassani, H., Rahmani, H., & Nobakht, B. (2009). Web page classification using social tags. In Computational science and engineering, 2009. CSE ’09. International conference on (Vol. 4, pp. 588–593).Google Scholar
  32. 32.
    Zou, J., Chen, G.-L., & Guo, W.-Z. (2005). Chinese web page classification using noise-tolerant support vector machines. In Natural language processing and knowledge engineering, 2005. IEEE NLP-KE ’05. Proceedings of 2005 IEEE international conference (pp. 785–790).Google Scholar
  33. 33.
    Kwon, O. W., & Lee, J. H. (2000, November). Web page classification based on k-nearest neighbor approach. In Proceedings of the fifth international workshop on on Information retrieval with Asian languages (pp. 9–15). New York, USA.Google Scholar
  34. 34.
    Liu, Z. Q., & Zhang, Y. J. (2001). A competitive neural network approach to web-page categorization. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(06), 731–741.CrossRefzbMATHGoogle Scholar
  35. 35.
    Enhong, C., Shangfei, W., Zhenya, Z., & Xufa, W. (2001). Document classification with CC4 neural network. In Proceedings of ICONIP. Sanghai, China.Google Scholar
  36. 36.
    Ozel, S. A. (2011). A web page classification system based on a genetic algorithm using tagged-terms as features. Expert Systems with Applications, 38(4), 3407–3415.CrossRefGoogle Scholar
  37. 37.
    Qi, X., & Davison, B. D. (2006, November). Knowing a web page by the company it keeps. In Proceedings of the 15th ACM international conference on Information and knowledge management (pp. 228–237).Google Scholar
  38. 38.
    Blum, A., & Mitchell, T. (1998, July). Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory (pp. 92–100).Google Scholar
  39. 39.
    Chen, R. C., & Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427–435.CrossRefGoogle Scholar
  40. 40.
    Olson, D. L., & Delen, D. (2008). Advanced data mining techniques. Berlin: Springer Science & Business Media.zbMATHGoogle Scholar
  41. 41.
    Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27. http://www.csie.ntu.edu.tw/cjlin/libsvm
  42. 42.
    Hsu, C. H., Chang, C. C. & Lin, C. J. (2003). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University.Google Scholar
  43. 43.
    Chapelle, O. (2007). Training a support vector machine in the primal. Neural Computation, 19(5), 1155–1178.MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    Lee, Y.-B., & Myaeng, S. H. (2002). Text genre classification with genre-revealing and subject-revealing features. In Proceeding SIGIR ’02 Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval (pp. 145–150).Google Scholar
  45. 45.
    Peng, X., & Choi, B. (2002). Automatic web page classification in a dynamic and hierarchical way. In Data mining, 2002. ICDM 2003. Proceedings. 2002 IEEE international conference (pp. 386–393).Google Scholar
  46. 46.
    Schenker, A., Last, M., Bunke, H., & Kandel, A. (2003). Classification of web documents using a graph model. International Journal of Pattern Recognition and Artificial Intelligence, 18(03), 475–496.CrossRefzbMATHGoogle Scholar
  47. 47.
    Shen, D., Chen, Z., & Yang, Q., (2004). Web-page classification through summarization. In Proceeding SIGIR ’04. Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 242–249).Google Scholar
  48. 48.
    Kan, M.-Y., & Hoang Oanh Nguyen, T. (2005). Fast webpage classification using URL features. In Proceeding CIKM ’05. Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 325–326).Google Scholar
  49. 49.
    Devi, M. I., Rajaram, R., & Selvakuberan, K. (2007). Machine learning techniques for automated web page classification using URL features. In Conference on computational intelligence and multimedia applications, 2007. international conference (Vol. 2, pp. 116–120).Google Scholar
  50. 50.
    Yin, Z., Li, Z., Mei, Q., & Han, J. (2009). Exploring social tagging graph for web object classification. In Proceeding KDD ’09. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 957–966).Google Scholar
  51. 51.
    Punera, K., Rajan, S., & Ghosh, J., (2005). Automatically learning document taxonomies for hierarchical classification. In Proceeding WWW ’05. Special interest tracks and posters of the 14th international conference on world wide web (pp. 1010–1011).Google Scholar
  52. 52.
    Liang, J. (2004). SVM multi-classifier and web document classification. In Machine learning and cybernetics, 2004. Proceedings of 2004 international conference on (Vol. 3, pp. 1347–1351).Google Scholar
  53. 53.
    Sun, A., Liu, Y., & Lim, E.-P. (2011). Web classification of conceptual entities using co-training. Expert Systems with Applications, 38(12), 14367–14375.CrossRefGoogle Scholar
  54. 54.
    Godoy, D. (2012). One-class support vector machines for personalized tag-based resource classification in social bookmarking systems. Concurrency and Computation: Practice and Experience, 24(17), 2193–2206.CrossRefGoogle Scholar
  55. 55.
    Liu, R., Zhou, J., & Liu, M. (2006). A graph-based semi-supervised learning algorithm for web page classification. In Intelligent systems design and applications, 2006. ISDA ’06. Sixth international conference (Vol. 2, pp. 856–860).Google Scholar
  56. 56.
    Sun, A., Lim, E.-P., & Ng, W.-K. (2002). Web classification using support vector machine. In Proceeding WIDM ’02. Proceedings of the 4th international workshop on web information and data management (pp. 96–99).Google Scholar
  57. 57.
    Zou, J., Chen, G.-L. & Guo, W.-Z. (2005). Chinese web page classification using noise-tolerant support vector machines. In Natural language processing and knowledge engineering, 2005. IEEE NLP-KE ’05. Proceedings of 2005 IEEE international conference (pp. 785–790).Google Scholar
  58. 58.
    Chen, R.-C., & Hsieh, C.-H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427–435.CrossRefGoogle Scholar
  59. 59.
    Xue, W., Bao, H., Huang, W., & Lu, Y. (2006). Web page classification based on SVM. In Intelligent control and automation, 2006. WCICA 2006. The sixth world congress on (Vol. 2, pp. 6111–6114).Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThapar UniversityPatialaIndia

Personalised recommendations