Active Learning: Applying RinSCut Thresholding Strategy to Uncertainty Sampling

  • Kang Hyuk Lee
  • Judy Kay
  • Byeong Ho Kang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2903)

Abstract

In many supervised learning approaches to text classification, it is necessary to have a large volume of manually labeled documents to achieve a high level of performance. This manual labeling process is time-consuming, expensive, and will have some level of inconsistency. Two common approaches to reduce the amount of expensive labeled examples are: (1) selecting informative uncertain examples for human-labeling, rather than relying on random sampling, and (2) using many inexpensive unlabeled data with a small number of manually labeled examples. Previous research has been focused on a single approach and has shown the considerable reduction on the amount of labeled examples to achieve a given level of performance. In this paper, we investigate a new framework to combine both approaches for similarity-based text classification. By applying our new thresholding strategy, RinSCut, to the conventional uncertainty sampling, we propose a new framework which automatically selects informative uncertain data that should be presented to human expert for labeling and positive-certain data that could be directly used for learning without human-labeling. Extensive experiments have been conducted on Reuters-21578 dataset to compare our proposed scheme with random sampling and conventional uncertainty sampling schemes, based on micro and macro-averaged F1. The results showed that if both macro and micro-averaged measures are concerned, the optimal choice might be our new approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Kang Hyuk Lee
    • 1
  • Judy Kay
    • 1
  • Byeong Ho Kang
    • 2
  1. 1.School of Information TechnologiesUniversity of SydneyAustralia
  2. 2.School of ComputingUniversity of TasmaniaHobartAustralia

Personalised recommendations