Advertisement

A Novel Field Learning Algorithm for Dual Imbalance Text Classification

  • Ling Zhuang
  • Honghua Dai
  • Xiaoshu Hang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3614)

Abstract

Fish-net algorithm is a novel field learning algorithm which derives classification rules by looking at the range of values of each attribute instead of the individual point values. In this paper, we present a Feature Selection Fish-net learning algorithm to solve the Dual Imbalance problem on text classification. Dual imbalance includes the instance imbalance and feature imbalance. The instance imbalance is caused by the unevenly distributed classes and feature imbalance is due to the different document length. The proposed approach consists of two phases: (1) select a feature subset which consists of the features that are more supportive to difficult minority class; (2) construct classification rules based on the original Fish-net algorithm. Our experimental results on Reuters21578 show that the proposed approach achieves better balanced accuracy rate on both majority and minority class than Naive Bayes MultiNomial and SVM.

Keywords

Feature Selection Receiver Operating Characteristic Curve Area Under Curve Feature Subset Minority Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Forman, G.: A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the 21st International Conference on Machine Learning (2004)Google Scholar
  2. 2.
    Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter:Special issue on learning from imbalanced datasets 6, 80–89 (2004)CrossRefGoogle Scholar
  3. 3.
    Dai, H., Hang, X., Li, G.: Inexact field learning: An approach to induce high quality rules from low quality data. In: Proceedings of 2001 IEEE International Conference on Data Mining (2001)Google Scholar
  4. 4.
    Ciesielski, V., Dai, H.: Fisherman: a comprehensive discovery, learning and forecasting systems. In: Proceedings of 2nd Singapore International Conference on Intelligent System, pp. B297(1)–B297(6) (1994)Google Scholar
  5. 5.
    Dai, H., Ciesielski, V.: Learning of inexact rules by the fish-net algorithm from low quality data. In: Proceedings of the Eighth Australian Joint Artificial Intelligence Conference, pp. 108–115 (1994)Google Scholar
  6. 6.
    Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
  7. 7.
    Joachims, T.: Making large-scale support vector machine learning practical. In: Scholkopf, B., Burges, C., Smola, A.S. (eds.) Advances in Kernel Methods: Support Vector Machines, MIT Press, Cambridge (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ling Zhuang
    • 1
  • Honghua Dai
    • 1
  • Xiaoshu Hang
    • 1
  1. 1.School of Information TechnologyDeakin UniversityAustralia

Personalised recommendations