Skip to main content

Pattern Extraction Method for Text Classification

  • Chapter
Book cover Technologies for Constructing Intelligent Systems 1

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 89))

  • 188 Accesses

Abstract

The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995.

    Google Scholar 

  2. William W. Cohen and Haym Hirsh. Joins that generalize: Text classification using whirl. In Proc. KDD-98, New York,1998.http://www.research.att.com/~wcohen/

  3. V.M. Fayad, G.Piatetsky Shapiro, P. Smyth, R. Uthurusamy (eds): Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press 1996.

    Google Scholar 

  4. Nguyen H.Son, Skowron A., 1997. Boolean reasoning for feature extraction problems. In: Z.W. Rai and A.Skowron (Eds.): Proceedings of Tenth International Symposium on Foundation of Intelligent Systems, ISMIS’97, Oct. 1997, NC, USA, Foundation of Intelligent Systems LNAI 1325, Springer Verlag, pp. 117–126.

    Google Scholar 

  5. H.S. Nguyen and S.H. Nguyen. Pattern extraction from data, Fundamenta Informaticae 34 (1998) 129–144.

    MathSciNet  MATH  Google Scholar 

  6. Nguyen H. Son, Nguyen S. Hoa (1999). Rough Sets and Association rule Generation. Fundamenta Informaticae 40, pp. 383–405.

    MathSciNet  MATH  Google Scholar 

  7. Nguyi;n S. Hoa, A. Skowron, P. Synak, 1998. Discovery of data pattern with applications to decomposition and classification problems. In L. Polkowski, A. Skowron (eds.): Rough Sets in Knowledge Discovery 2. Physica-Verlag, Heidelberg, pp. 55–97.

    Google Scholar 

  8. Nguyen S.Hoa, 1999. Discovery of Generalized Patterns. In Z.W. Rai and A.Skowron (Eds.): Proceedings of 11th International Symposium on Foundation of Intelligent Systems, ISMIS’99, Foundation of Intelligent Systems LNAI 1609, pp. 574–582.

    Google Scholar 

  9. Pawlak Z., 1991. Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht.

    Google Scholar 

  10. M. Porter. An algorithm for suffix stripping. Program, 14 (3): 130–137, 1980.

    Article  Google Scholar 

  11. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.

    Google Scholar 

  12. Hui Wang, No Düntsch, and David Bell. Data reduction based on hyper relations. In Proceedings of KDD98, New York, pages 349–353, 1998.

    Google Scholar 

  13. Hui Wang, Son Nguyen. Text classification using Lattice Machine. In Proceedings of ISMIS’99, Springer-Verlag, Warsaw, pages 349–353, 1999.

    Google Scholar 

  14. Jinxi Xu and W.B. Croft. Corpus-based stemming using co-occurrence of word variants. ACM TOIS, 16 (1): 61–81, Jan. 1998.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nguyen, H.S., Wang, H. (2002). Pattern Extraction Method for Text Classification. In: Bouchon-Meunier, B., Gutiérrez-Ríos, J., Magdalena, L., Yager, R.R. (eds) Technologies for Constructing Intelligent Systems 1. Studies in Fuzziness and Soft Computing, vol 89. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1797-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1797-3_18

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-662-00329-9

  • Online ISBN: 978-3-7908-1797-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics