Abstract
The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W. W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference. Morgan Kaufmann, 1995.
William W. Cohen and Haym Hirsh. Joins that generalize: Text classification using whirl. In Proc. KDD-98, New York,1998.http://www.research.att.com/~wcohen/
V.M. Fayad, G.Piatetsky Shapiro, P. Smyth, R. Uthurusamy (eds): Advanced in Knowledge Discovery and Data Mining, AAAI/MIT Press 1996.
Nguyen H.Son, Skowron A., 1997. Boolean reasoning for feature extraction problems. In: Z.W. Rai and A.Skowron (Eds.): Proceedings of Tenth International Symposium on Foundation of Intelligent Systems, ISMIS’97, Oct. 1997, NC, USA, Foundation of Intelligent Systems LNAI 1325, Springer Verlag, pp. 117–126.
H.S. Nguyen and S.H. Nguyen. Pattern extraction from data, Fundamenta Informaticae 34 (1998) 129–144.
Nguyen H. Son, Nguyen S. Hoa (1999). Rough Sets and Association rule Generation. Fundamenta Informaticae 40, pp. 383–405.
Nguyi;n S. Hoa, A. Skowron, P. Synak, 1998. Discovery of data pattern with applications to decomposition and classification problems. In L. Polkowski, A. Skowron (eds.): Rough Sets in Knowledge Discovery 2. Physica-Verlag, Heidelberg, pp. 55–97.
Nguyen S.Hoa, 1999. Discovery of Generalized Patterns. In Z.W. Rai and A.Skowron (Eds.): Proceedings of 11th International Symposium on Foundation of Intelligent Systems, ISMIS’99, Foundation of Intelligent Systems LNAI 1609, pp. 574–582.
Pawlak Z., 1991. Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht.
M. Porter. An algorithm for suffix stripping. Program, 14 (3): 130–137, 1980.
Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.
Hui Wang, No Düntsch, and David Bell. Data reduction based on hyper relations. In Proceedings of KDD98, New York, pages 349–353, 1998.
Hui Wang, Son Nguyen. Text classification using Lattice Machine. In Proceedings of ISMIS’99, Springer-Verlag, Warsaw, pages 349–353, 1999.
Jinxi Xu and W.B. Croft. Corpus-based stemming using co-occurrence of word variants. ACM TOIS, 16 (1): 61–81, Jan. 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nguyen, H.S., Wang, H. (2002). Pattern Extraction Method for Text Classification. In: Bouchon-Meunier, B., Gutiérrez-RÃos, J., Magdalena, L., Yager, R.R. (eds) Technologies for Constructing Intelligent Systems 1. Studies in Fuzziness and Soft Computing, vol 89. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1797-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1797-3_18
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-662-00329-9
Online ISBN: 978-3-7908-1797-3
eBook Packages: Springer Book Archive