Advertisement

Rough Set Feature Selection Algorithms for Textual Case-Based Classification

  • Kalyan Moy Gupta
  • David W. Aha
  • Philip Moore
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4106)

Abstract

Feature selection algorithms can reduce the high dimensionality of textual cases and increase case-based task performance. However, conventional algorithms (e.g., information gain) are computationally expensive. We previously showed that, on one dataset, a rough set feature selection algorithm can reduce computational complexity without sacrificing task performance. Here we test the generality of our findings on additional feature selection algorithms, add one data set, and improve our empirical methodology. We observed that features of textual cases vary in their contribution to task performance based on their part-of-speech, and adapted the algorithms to include a part-of-speech bias as background knowledge. Our evaluation shows that injecting this bias significantly increases task performance for rough set algorithms, and that one of these attained significantly higher classification accuracies than information gain. We also confirmed that, under some conditions, randomized training partitions can dramatically reduce training times for rough set algorithms without compromising task performance.

Keywords

Feature Selection Information Gain Feature Selection Algorithm Discernibility Matrix Reduce Training Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. An, A., Huang, Y., Huang, X., Cercone, N.: An effective rough set-based method for text classification. Transactions on Rough Sets 2, 1–13 (2004)CrossRefGoogle Scholar
  2. Brill, E.: A corpus-based approach to language learning. Doctoral dissertation: Department of Computer Science, University of Pennsylvania, Philadelphia, PA (1993)Google Scholar
  3. Bruninghaus, S., Ashley, K.D.: Combining case-based and model-based reasoning for predicting the outcome of legal cases. In: Proceedings of the Fifth International Conference on Case-Based Reasoning. Springer, Trondheim, pp. 65–79 (2003) Google Scholar
  4. Chouchoulas, A., Shen, Q.: Rough-set aided keyword reduction for text categorization. Applied Artificial Intelligence 15, 843–873 (2001)CrossRefGoogle Scholar
  5. Delany, S.J., Cunningham, P., Doyle, D., Zamolokskikh, A.: Generating estimates of classification confidence for a case-based spam filter. In: Muñoz-Ávila, H., Ricci, F. (eds.) ICCBR 2005. LNCS, vol. 3620, pp. 177–190. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. Gupta, K.M., Aha, D.W.: RuMop: A rule-based morphotactic parser. In: Proceedings of the International Conference on Natural Language Processing, pp. 280–284. Allied Publishers, Hyderabad (2004)Google Scholar
  7. Gupta, K.M., Moore, P.G., Aha, D.W., Pal, S.K.: Rough set feature selection methods for case-based categorization of text documents. In: Proceedings of the First International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India, pp. 792–798. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. Johnson, D.S.: Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences 9, 256–278 (1974)MATHMathSciNetCrossRefGoogle Scholar
  9. Li, Y., Shiu, S.C.K., Pal, S.: Combining Feature Reduction and Case Selection in Building CBR Classifiers. In: Aha, D.W., Gupta, K.M., Pal, S.K. (eds.) Case-Based Reasoning and Data Mining. John Wiley, Hoboken (2006)Google Scholar
  10. Montazemi, A.R., Gupta, K.M.: A framework for retrieval in case-based reasoning systems. Annals of operations research 72, 51–73 (1997)MATHCrossRefGoogle Scholar
  11. Pal, S.K., Shiu, S.C.K.: Foundations of soft case-based reasoning. Wiley, Hoboken (2004)CrossRefGoogle Scholar
  12. Pawlak, Z.: Rough sets. Kluwer Academic Publishers, Norwell (1991)MATHGoogle Scholar
  13. Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A comprehensive grammar of the English language. Longman, New York (1985)Google Scholar
  14. Reuters. Reuters-21578 Evaluation Data (2006) (Retrieved on April 12, 2005), http://www.daviddlewis.com/resources/testcollections/reuters21578/
  15. Skowron, A.: Extracting laws from decision tables. Computational Intelligence 11(2), 371–388 (1995)CrossRefMathSciNetGoogle Scholar
  16. Weber, R.O., Ashley, K.D., Brüninghaus, S.: Textual case-based reasoning. Knowledge Engineering Review 20(3) (to appear, 2005)Google Scholar
  17. Wilson, D.C., Bradshaw, S.: CBR textuality. Expert Update 3(1), 28–37 (2000)Google Scholar
  18. Wiratunga, N., Koychev, I., Massie, S.: Feature selection and generalization for retrieval of textual cases. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS, vol. 3155, pp. 806–820. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  19. Yang, Y., Pederson, J.: A comparative study of feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, Nashville (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kalyan Moy Gupta
    • 1
  • David W. Aha
    • 2
  • Philip Moore
    • 3
  1. 1.Knexus Research Corp.SpringfieldUSA
  2. 2.Naval Research Laboratory (Code 5515)WashingtonUSA
  3. 3.AES DivisionITT IndustriesAlexandriaUSA

Personalised recommendations