Short-Text Classification Based on ICA and LSA
Many applications, such as word-sense disambiguation and information retrieval, can benefit from text classification. Text classifiers based on Independent Component Analysis (ICA) try to make the most of the independent components of text documents and give in many cases good classification effects. Short-text documents, however, usually have little overlap in their feature terms and, in this case, ICA can not work well. Our aim is to solve the short-text problem in text classification by using Latent Semantic Analysis (LSA) as a data preprocessing method, then employing ICA for the preprocessed data. The experiment shows that using ICA and LSA together rather than only using ICA in Chinese short-text classification can provide better classification effects.
KeywordsText Classification Independent Component Analysis Latent Semantic Analysis Feature Term Document Corpus
Unable to display preview. Download preview PDF.
- Honkela, T., Hyvärinen, A.: Linguistic Feature Extraction Using Independent Component Analysis. In: Proc. Int. Joint Conf. on Neural Networks (IJCNN), Budapest, Hungary (2004)Google Scholar
- Kolenda, T., Hansen, L.K.: Independent Components in Text. Advances in Neural Information Processing Systems 13, 235–256 (2000)Google Scholar
- Hyvärinen, A.: Survey on Independent Component Analysis. Neural Computing Surveys 2, 94–128 (1999)Google Scholar
- Isbell, C.L., Viola, P.: Restructuring Sparse High Dimensional Data for Effective Retrieval. Advances in Neural Information Processing Systems 11, 480–486 (1998)Google Scholar