Abstract
In this paper, we focus on text categorization model by unsupervised learning techniques that do not require labeled data. We propose a feature learning bootstrapping algorithm (FLB) using a small number of seed words, in that features for each of categories could be automatically learned from a large amount of unlabeled documents. Using these learned features we develop a new Naïve Bayes classifier named NB_FLB. Experimental results show that the NB_FLB classifier performs better than other Naïve Bayes classifiers by supervised learning in small number of features cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ittner, D.J., Lewis, D.D., Ahn, D.D.: Text categorization of low quality images. In: Symposium on Document Analysis and Information Retrieval, Las Vegas (1995)
Lewis, D., Schapire, R., Callan, J., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proceedings of ACM SIGIR, pp. 298–306 (1996)
Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Lewis, D.: A Comparison of Two Learning Algorithms for Text Categorization. In: Symposium on Document Analysis and IR (1994)
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)
McCallum, A., Nigam, K.: A Comparison of Event Models for naïve Bayes Text Classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Transactions on Neural Networks 11(3), 574–585 (2000)
Rauber, A., Schweighofer, E., Merkl, D.: Text classification and labelling of document clusters with self-organising maps. Journal of the Austrian Society for Artificial Intelligence 19(3), 17–23 (2000)
Carpenter, G.A., Grossberg, S., Reynolds, J.H.: ARTMAP: Supervised realtime learning and classification of nonstationary data by self-organizing neural network. Neural Networks 4, 565–588 (1991)
Tan, A.H.: Adaptive resonance associative map. Neural Networks 8(3), 437–446 (1995)
Castelli, V., Cover, T.M.: The relative value of labeled and unlabeled samples in pattern recognition with unknown mixing parameter. IEEE Transactions on Information Theory (November 1996)
Ko, Y., Sco, J.: Automatic Text Categorization by Unsupervised Learning. In: COLING 2002 (2002)
Yang, Y., Pedersen, J.P.: Feature selection in statistical learning of text categorization. In: The 14th Int. Conf. on Machine Learning, pp. 412–420 (1997)
Abney, S.: Bootstrapping. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002) (2002)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory (1998)
Li, C., Li, H.: Word Translation Disambiguation Using Bilingual Bootstrapping. In: Proceedings of the 40th Annual Meeting of Association for Computational Linguistics (ACL 2002) (2002)
Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI 1999) (1999)
Yao, T.S., et al.: Natural Language Processing-research on making computers understand human languages. Tsinghua University Press, Beijing (2002) (in Chinese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jingbo, Z., Wenliang, C., Tianshun, Y. (2004). Using Seed Words to Learn to Categorize Chinese Text. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds) Advances in Natural Language Processing. EsTAL 2004. Lecture Notes in Computer Science(), vol 3230. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30228-5_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-30228-5_41
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23498-2
Online ISBN: 978-3-540-30228-5
eBook Packages: Springer Book Archive