Pattern Extraction Method for Text Classification

Nguyen, Hung Son; Wang, Hui

doi:10.1007/978-3-7908-1797-3_18

Hung Son Nguyen⁶ &
Hui Wang⁷

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 89))

188 Accesses

Abstract

The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

W. W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995.
Google Scholar
William W. Cohen and Haym Hirsh. Joins that generalize: Text classification using whirl. In Proc. KDD-98, New York,1998.http://www.research.att.com/~wcohen/
V.M. Fayad, G.Piatetsky Shapiro, P. Smyth, R. Uthurusamy (eds): Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press 1996.
Google Scholar
Nguyen H.Son, Skowron A., 1997. Boolean reasoning for feature extraction problems. In: Z.W. Rai and A.Skowron (Eds.): Proceedings of Tenth International Symposium on Foundation of Intelligent Systems, ISMIS’97, Oct. 1997, NC, USA, Foundation of Intelligent Systems LNAI 1325, Springer Verlag, pp. 117–126.
Google Scholar
H.S. Nguyen and S.H. Nguyen. Pattern extraction from data, Fundamenta Informaticae 34 (1998) 129–144.
MathSciNet MATH Google Scholar
Nguyen H. Son, Nguyen S. Hoa (1999). Rough Sets and Association rule Generation. Fundamenta Informaticae 40, pp. 383–405.
MathSciNet MATH Google Scholar
Nguyi;n S. Hoa, A. Skowron, P. Synak, 1998. Discovery of data pattern with applications to decomposition and classification problems. In L. Polkowski, A. Skowron (eds.): Rough Sets in Knowledge Discovery 2. Physica-Verlag, Heidelberg, pp. 55–97.
Google Scholar
Nguyen S.Hoa, 1999. Discovery of Generalized Patterns. In Z.W. Rai and A.Skowron (Eds.): Proceedings of 11th International Symposium on Foundation of Intelligent Systems, ISMIS’99, Foundation of Intelligent Systems LNAI 1609, pp. 574–582.
Google Scholar
Pawlak Z., 1991. Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht.
Google Scholar
M. Porter. An algorithm for suffix stripping. Program, 14 (3): 130–137, 1980.
Article Google Scholar
Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.
Google Scholar
Hui Wang, No Düntsch, and David Bell. Data reduction based on hyper relations. In Proceedings of KDD98, New York, pages 349–353, 1998.
Google Scholar
Hui Wang, Son Nguyen. Text classification using Lattice Machine. In Proceedings of ISMIS’99, Springer-Verlag, Warsaw, pages 349–353, 1999.
Google Scholar
Jinxi Xu and W.B. Croft. Corpus-based stemming using co-occurrence of word variants. ACM TOIS, 16 (1): 61–81, Jan. 1998.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics, Warsaw University, Banacha 2, Warsaw, 02095, Poland
Hung Son Nguyen
School of Information and Software, Engineering University of Ulster at Jordanstown N, Ireland, BT37 0QB
Hui Wang

Authors

Hung Son Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS, LIP6, Université Paris VI, 8 rue du Capitaine Scott, 75015, Paris, France
Bernadette Bouchon-Meunier
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660, Boadilla del Monte, Madrid, Spain
Julio Gutiérrez-Ríos
Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain
Luis Magdalena
Machine Intelligence Institute, Iona College, 10801, New Rochelle, NY, USA
Ronald R. Yager

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nguyen, H.S., Wang, H. (2002). Pattern Extraction Method for Text Classification. In: Bouchon-Meunier, B., Gutiérrez-Ríos, J., Magdalena, L., Yager, R.R. (eds) Technologies for Constructing Intelligent Systems 1. Studies in Fuzziness and Soft Computing, vol 89. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1797-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-7908-1797-3_18
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-662-00329-9
Online ISBN: 978-3-7908-1797-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics