Web Page Classification: A Soft Computing Approach

  • Angela Ribeiro
  • Víctor Fresno
  • María C. Garcia-Alegre
  • Domingo Guinea
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2663)

Abstract

The Internet makes it possible to share and manipulate a vast quantity of information efficiently and effectively, but the rapid and chaotic growth experienced by the Net has generated a poorly organized environment that hinders the sharing and mining of useful data. The need for meaningful web-page classification techniques is therefore becoming an urgent issue. This paper describes a novel approach to web-page classification based on a fuzzy representation of web pages. A doublet representation that associates a weight with each of the most representative words of the web document so as to characterize its relevance in the document. This weight is derived by taking advantage of the characteristics of HTML language. Then a fuzzy-rule-based classifier is generated from a supervised learning process that uses a genetic algorithm to search for the minimum fuzzy-rule set that best covers the training examples. The proposed system has been demonstrated with two significantly different classes of web pages.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Angela Ribeiro
    • 1
  • Víctor Fresno
    • 2
  • María C. Garcia-Alegre
    • 1
  • Domingo Guinea
    • 1
  1. 1.Industrial Automation InstituteSpanish Council for Scientific ResearchArganda del Rey, MadridSpain
  2. 2.Escuela Superior de Ciencia y TecnologíaUniversidad Rey Juan CarlosSpain

Personalised recommendations