Assessing the Impact of Class-Imbalanced Data for Classifying Relevant/Irrelevant Medline Documents
Imbalanced data is a well-known common problem in many practical applications of machine learning and its effects on the performance of standard classifiers are remarkable. In this paper we investigate if the classification of Medline documents using MeSH controlled vocabulary poses additional challenges when dealing with class-imbalanced prediction. For this task, we evaluate the performance of Bayesian networks by using some available strategies to overcome the effect of class imbalance. Our results show both that Bayesian network classifiers are sensitive to class imbalance and existing techniques can improve their overall performance.
Keywordsdocument classification imbalanced data Medline documents MeSH terms Bayesian networks
Unable to display preview. Download preview PDF.