Rough Set Feature Selection Methods for Case-Based Categorization of Text Documents

Gupta, Kalyan Moy; Moore, Philip G.; Aha, David W.; Pal, Sankar K.

doi:10.1007/11590316_128

Kalyan Moy Gupta^19,20,
Philip G. Moore^19,20,
David W. Aha¹⁹ &
…
Sankar K. Pal²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3776))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

Abstract

Textual case bases can contain thousands of features in the form of tokens or words, which can inhibit classification performance. Recent developments in rough set theory and its applications to feature selection offer promising approaches for selecting and reducing the number of features. We adapt two rough set feature selection methods for use on n-ary class text categorization problems. We also introduce a new method for selecting features that computes the union of features selected from randomly-partitioned training subsets. Our comparative evaluation of our method with a conventional method on the Reuters-21578 data set shows that it can dramatically decrease training time without compromising classification accuracy. Also, we found that randomized training set partitions dramatically reduce training time.

Download to read the full chapter text

Chapter PDF

Fuzzy Rough Set-Based Feature Selection for Text Categorization

Information and Rough Set Theory Based Feature Selection Techniques

Game-Theoretic Rough Sets for Feature Selection

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chouchoulas, A., Shen, Q.: Rough-set aided keyword reduction for text categorization. Applied Artificial Intelligence 15, 843–873 (2001)
Article Google Scholar
Johnson, D.S.: Approximation algorithms for combinatorial problems. Journal of Com-puter and System Sciences 9, 256–278 (1974)
Article MATH Google Scholar
Li, Y., Shiu, S.C.K., Pal, S.K.: Combining feature reduction and case selection in building CBR classifiers. In: Pal, S.K., Aha, D.W., Gupta, K.M. (eds.) Case-based reasoning in knowledge discovery and data mining. Wiley, New York (2005) (to appear)
Google Scholar
Pal, S.K., Shiu, S.C.K.: Foundations of soft case-based reasoning. Wiley, Hoboken (2004)
Book Google Scholar
Pawlak, Z.: Rough sets: Theoretical aspects of reasoning about data. Kluwer, Dordrecht (1991)
MATH Google Scholar
Popova, V.N.: Knowledge discovery and monotonicity. Doctoral dissertation, Rotterdam School of Economics, Erasmus University, The Netherlands (2004)
Google Scholar
Reuters (2005), http://www.daviddlewis.com/resources/testcollections/reuters21578
Wiratunga, N., Koychev, I., Massie, S.: Feature selection and generalization for re-trieval of textual cases. In: Proceedings of the Seventh European Conference on Case-Based Reasoning, Madrid, Spain, pp. 806–820. Springer, Heidelberg (2004)
Google Scholar
Yang, Y., Pederson, J.: A comparative study of feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann, Nashville (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

ITT Industries, 2560 Huntington Ave, Alexandria, VA, USA
Kalyan Moy Gupta, Philip G. Moore & David W. Aha
Naval Research Laboratory, 4555 Overlook Ave, SW, Washington, DC, USA
Kalyan Moy Gupta & Philip G. Moore
Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata, India
Sankar K. Pal

Authors

Kalyan Moy Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Philip G. Moore
View author publications
You can also search for this author in PubMed Google Scholar
David W. Aha
View author publications
You can also search for this author in PubMed Google Scholar
Sankar K. Pal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, India
Sankar K. Pal
Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, 700108, Kolkata
Sanghamitra Bandyopadhyay
Machine Intelligence Unit, Indian Statistical Institute, 700 108, Kolkata, India
Sambhunath Biswas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, K.M., Moore, P.G., Aha, D.W., Pal, S.K. (2005). Rough Set Feature Selection Methods for Case-Based Categorization of Text Documents. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2005. Lecture Notes in Computer Science, vol 3776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590316_128

Download citation

DOI: https://doi.org/10.1007/11590316_128
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30506-4
Online ISBN: 978-3-540-32420-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Rough Set Feature Selection Methods for Case-Based Categorization of Text Documents

Abstract

Chapter PDF

Similar content being viewed by others

Fuzzy Rough Set-Based Feature Selection for Text Categorization

Information and Rough Set Theory Based Feature Selection Techniques

Game-Theoretic Rough Sets for Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Rough Set Feature Selection Methods for Case-Based Categorization of Text Documents

Abstract

Chapter PDF

Similar content being viewed by others

Fuzzy Rough Set-Based Feature Selection for Text Categorization

Information and Rough Set Theory Based Feature Selection Techniques

Game-Theoretic Rough Sets for Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation