Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization

  • Jiani Hu
  • Weihong Deng
  • Jun Guo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4223)


This paper proposes a Discriminative Semantic Feature (DSF) method for vector space model based text categorization. The DSF method, which involves two stages, first reduces the dimension of the document vector space by Latent Semantic Indexing (LSI), and then applies a Robust linear Discriminant analysis Model (RDM), which improves the classical LDA by a energy-adaptive regularization criteria, to extract the discriminative semantic feature with enhanced discrimination power. As a result, DSF method can not only uncover latent semantic structure but also capture the discriminative feature. Comparative experiments on various state-of-art dimension reduction schemes such as our DSF, LSI, orthogonal centroid, two-stage LSI+LDA, LDA/QR and LDA/GSVD, are also performed. Experiments using the Reuters-21578 text collection show the proposed method performs better than other algorithms.


Text Categorization Semantic Feature Latent Semantic Analysis Vector Space Model Latent Sematic Indexing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  2. 2.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  3. 3.
    Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley, Chichester (2000)Google Scholar
  4. 4.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, London (1990)zbMATHGoogle Scholar
  5. 5.
    Howland, P., Park, H.: Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 26, 995–1006 (2004)CrossRefGoogle Scholar
  6. 6.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)Google Scholar
  7. 7.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  8. 8.
    Lewis, D.D.: Reuters-21578 text categorization test collection
  9. 9.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program 14, 130–137 (1980)Google Scholar
  10. 10.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)zbMATHCrossRefGoogle Scholar
  11. 11.
    Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24, 513–523 (1988)CrossRefGoogle Scholar
  12. 12.
    Torkkola, K.: Linear discriminant analysis in document classification. In: IEEE International Conference on Data Mining (ICDM) Workshop on Text Mining (2001)Google Scholar
  13. 13.
    Thomaz, C.E., Gillies, D.F., Feitosa, R.Q.: A New Covariance Estimate for Bayesian Classifier in Biometric Recognition. IEEE Trans. CSVT 14, 214–223 (2004)Google Scholar
  14. 14.
    Ye, J., Li, Q.: A Two-Stage Linear Discriminant Analysis via QR-Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 27, 929–941 (2005)CrossRefGoogle Scholar
  15. 15.
    Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55, 311–331 (2004)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiani Hu
    • 1
  • Weihong Deng
    • 1
  • Jun Guo
    • 1
  1. 1.Beijing University of Posts and TelecommunicationsBeijingChina

Personalised recommendations