Assessing the Impact of Class-Imbalanced Data for Classifying Relevant/Irrelevant Medline Documents

  • Reyes Pavón
  • Rosalía Laza
  • Miguel Reboiro-Jato
  • Florentino Fdez-Riverola
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 93)

Abstract

Imbalanced data is a well-known common problem in many practical applications of machine learning and its effects on the performance of standard classifiers are remarkable. In this paper we investigate if the classification of Medline documents using MeSH controlled vocabulary poses additional challenges when dealing with class-imbalanced prediction. For this task, we evaluate the performance of Bayesian networks by using some available strategies to overcome the effect of class imbalance. Our results show both that Bayesian network classifiers are sensitive to class imbalance and existing techniques can improve their overall performance.

Keywords

document classification imbalanced data Medline documents MeSH terms Bayesian networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Reyes Pavón
    • 1
  • Rosalía Laza
    • 1
  • Miguel Reboiro-Jato
    • 1
  • Florentino Fdez-Riverola
    • 1
  1. 1.ESEI: Escuela Superior de Ingeniería InformáticaUniversity of Vigo, Edificio PolitécnicoOurenseSpain

Personalised recommendations