Skip to main content

A Rough Set-Based Approach to Text Classification

  • Conference paper
New Directions in Rough Sets, Data Mining, and Granular-Soft Computing (RSFDGrC 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1711))

Abstract

A non-trivial obstacle in good text classification for information filtering and retrieval (IF/IR) is the dimensionality of the data. This paper proposes a technique using Rough Set Theory to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. This simplifies the creation of knowledge-based IF/IR systems, speeds up their operation, and allows easy editing of the rule bases employed. The paper describes the proposed technique, discusses the integration of a keyword acquisition algorithm with a rough set-based dimensionality reduction algorithm, and provides experimental results of a proof-of-concept implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. van Rijsbergen, C.J.: Information Retrieval. Butterworths, United Kingdom (1990)

    Google Scholar 

  2. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)

    Google Scholar 

  3. Moukas, A., Maes, P.: Amalthaea: An Evolving Multi-Agent Information Filtering and Discovery System for the WWW. In: Journal of Autonomous Agents and Multi-Agent Systems, vol. 1, pp. 59–88 (1998)

    Google Scholar 

  4. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991)

    MATH  Google Scholar 

  5. Shen, Q., Chouchoulas, A.: Combining Rough Sets and Data-Driven Fuzzy Learning (accepted for publication in Pattern Recognition)

    Google Scholar 

  6. Chouchoulas, A., Shen, Q.: Rough Set-Aided Rule Induction for Plant Monitoring. In: Proceedings of the 1998 International Joint Conference on Information Science (JCISm 1998), vol. 2, pp. 316–319 (1998)

    Google Scholar 

  7. Crocker, D.H.: RFC 822, Standard for the Format of ARPA Internet Text Messages. Dept. of Electrical Engineering, Univ. of Delaware (1982)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chouchoulas, A., Shen, Q. (1999). A Rough Set-Based Approach to Text Classification. In: Zhong, N., Skowron, A., Ohsuga, S. (eds) New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. RSFDGrC 1999. Lecture Notes in Computer Science(), vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-48061-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66645-5

  • Online ISBN: 978-3-540-48061-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics