Skip to main content

A Novel Framework for Web Page Classification Using Two-Stage Neural Network

  • Conference paper
Advanced Data Mining and Applications (ADMA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3584))

Included in the following conference series:

Abstract

Web page classification is one of the essential techniques for Web mining. This paper presents a framework for Web page classification. It is hybrid architecture of neural network PCA (principle components analysis) and SOFM (self-organizing map). In order to perform the classification, a web page is firstly represented by a vector of features with different weights according to the term frequency and the importance of each sentence in the page. As the number of the features is big, PCA is used to select the relevant features. Finally the output of PCA is sent to SOFM for classification. To compare with the proposed framework, two conventional classifiers are used in our experiments: k-NN and Naïve Bayes. Our new method makes a significant improvement in classifications on both data sets compared with the two conventional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: AAAI 1998 workshop on learning for text categorization, pp. 41–48 (1998)

    Google Scholar 

  2. Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training algorithms for linear text classifiers. In: Proceedings of the 19th international conference on research and development in information retrieval, pp. 289–297 (1996)

    Google Scholar 

  3. Yang, Y., Slattery, S., Ghani, R.: A study of approaches to hypertext categorization. Journal of Information Systems 18(2-3) (March-May 2002)

    Google Scholar 

  4. Gentili, G.L., Marinilli, M., Micarelli, A., Sciarrone, F.: Text categorization in an intelligent agent for filtering information on the Web. International Journal of Pattern Recognition and Aritificial Intelligence 15(3), 527–549 (2002)

    Article  Google Scholar 

  5. Wermeter, S.: Neural network agents for learning semantic text classification. Information Retrieval 3(2), 87–103 (2000)

    Article  Google Scholar 

  6. Ruiz, E.M., Srinivasan, P.: Hierarchical text categorization using neural networks. Information Retrieval 5(1), 87–118 (2002)

    Article  MATH  Google Scholar 

  7. Calvo, R.A., Ceccatto, H.A.: Intelligent document classification. Intelligent Data Analysis 4(5), 411–420 (2000)

    MATH  Google Scholar 

  8. Calvo, R.A., Ceccatto, H.A.: Intelligent document classification. Intelligent Data Analysis 4(5), 411–420 (2000)

    MATH  Google Scholar 

  9. Kohonen, T.: Self-Organizing Maps, 2nd Extended edn., Berlin, Heidelberg, New York. Springer Series in Information Sciences, vol. 30 (1997)

    Google Scholar 

  10. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)

    MATH  Google Scholar 

  11. Calvo, R.A., Partridge, M., Jabri, M.: A comparative study of principal components analysis techniques. In: Proceedings 9th Australian Conference on Neural Networks, Brisbane, QLD, pp. 276–281 (1998)

    Google Scholar 

  12. Johnson, R.A., Wichern, W.D.: Applied Multivariate Statistical Analysis, 5th edn. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  13. Nouali, O., Blache, P.: A semantic vector space and features-based approach for automatic information filtering. Expert Systems with Applications 26, 171–179 (2004)

    Article  Google Scholar 

  14. Selamat, A.: Web page feature selection and classification using neural networks. Information Sciences 158, 69–88 (2004)

    Article  MathSciNet  Google Scholar 

  15. Ko, Y., Park, J., Seo, J.: Inproving text categorization using the importance of sentences. Information Processing and Management 40, 65–79 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Cao, Y., Zhu, Q., Zhu, Z. (2005). A Novel Framework for Web Page Classification Using Two-Stage Neural Network. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_60

Download citation

  • DOI: https://doi.org/10.1007/11527503_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27894-8

  • Online ISBN: 978-3-540-31877-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics