International Conference on Intelligent Data Engineering and Automated Learning

Intelligent Data Engineering and Automated Learning – IDEAL 2015 pp 9-17

Web Genre Classification via Hierarchical Multi-label Classification

  • Gjorgji Madjarov
  • Vedrana Vidulin
  • Ivica Dimitrovski
  • Dragi Kocev
Conference paper

DOI: 10.1007/978-3-319-24834-9_2

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9375)
Cite this paper as:
Madjarov G., Vidulin V., Dimitrovski I., Kocev D. (2015) Web Genre Classification via Hierarchical Multi-label Classification. In: Jackowski K., Burduk R., Walkowiak K., Wozniak M., Yin H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2015. Lecture Notes in Computer Science, vol 9375. Springer, Cham

Abstract

The increase of the number of web pages prompts for improvement of the search engines. One such improvement can be by specifying the desired web genre of the result web pages. This opens the need for web genre prediction based on the information on the web page. Typically, this task is addressed as multi-class classification, with some recent studies advocating the use of multi-label classification. In this paper, we propose to exploit the web genres labels by constructing a hierarchy of web genres and then use methods for hierarchical multi-label classification to boost the predictive performance. We use two methods for hierarchy construction: expert-based and data-driven. The evaluation on a benchmark dataset (20-Genre collection corpus) reveals that using a hierarchy of web genres significantly improves the predictive performance of the classifiers and that the data-driven hierarchy yields similar performance as the expert-driven with the added value that it was obtained automatically and fast.

Keywords

Web genre classification Hierarchy construction Hierarchical multi-label classification 

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Gjorgji Madjarov
    • 1
  • Vedrana Vidulin
    • 2
  • Ivica Dimitrovski
    • 1
  • Dragi Kocev
    • 3
    • 4
  1. 1.Faculty of Computer Science and EngineeringSs. Cyril and Methodius UniversitySkopjeMacedonia
  2. 2.Ruđer Bošković InstituteZagrebCroatia
  3. 3.Department of InformaticsUniversity of Bari Aldo MoroBariItaly
  4. 4.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations