Short-Text Classification Based on ICA and LSA

  • Qiang Pu
  • Guo-Wei Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3972)


Many applications, such as word-sense disambiguation and information retrieval, can benefit from text classification. Text classifiers based on Independent Component Analysis (ICA) try to make the most of the independent components of text documents and give in many cases good classification effects. Short-text documents, however, usually have little overlap in their feature terms and, in this case, ICA can not work well. Our aim is to solve the short-text problem in text classification by using Latent Semantic Analysis (LSA) as a data preprocessing method, then employing ICA for the preprocessed data. The experiment shows that using ICA and LSA together rather than only using ICA in Chinese short-text classification can provide better classification effects.


Text Classification Independent Component Analysis Latent Semantic Analysis Feature Term Document Corpus 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Honkela, T., Hyvärinen, A.: Linguistic Feature Extraction Using Independent Component Analysis. In: Proc. Int. Joint Conf. on Neural Networks (IJCNN), Budapest, Hungary (2004)Google Scholar
  2. Sevillano, X., Alías, F., Socoró, J.C.: Reliability in ICA-based Text Classification. In: Puntonet, C.G., Prieto, A.G. (eds.) ICA 2004. LNCS, vol. 3195, pp. 1213–1220. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. Kolenda, T., Hansen, L.K.: Independent Components in Text. Advances in Neural Information Processing Systems 13, 235–256 (2000)Google Scholar
  4. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  5. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  6. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  7. Hyvärinen, A.: Survey on Independent Component Analysis. Neural Computing Surveys 2, 94–128 (1999)Google Scholar
  8. Isbell, C.L., Viola, P.: Restructuring Sparse High Dimensional Data for Effective Retrieval. Advances in Neural Information Processing Systems 11, 480–486 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Qiang Pu
    • 1
  • Guo-Wei Yang
    • 1
  1. 1.School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations