Skip to main content
Log in

Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Berry, M. W: Large-scale sparse singular value computations, The International Journal of Super-computer Applications 6(1) (1992), 13–49.

    MathSciNet  Google Scholar 

  2. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41(6) (1990), 391–407.

    Article  Google Scholar 

  3. Kolenda, T., Hansen, L.-L. and Sigurdsson, S.: Independent components in text, In: M. Girolami (ed.), Advances in Independent Component Analysis (Springer-Verlag, 2000) pp. 241–262.

  4. Hofmann, T.: Probabilistic latent semantic analysis, Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI'99), San Francisco,CA, 1999, pp. 289–296.

  5. HyvOrinen, A. and Oja, E.: A fast fixed-point algorithm for independent component analysis, Neural Computation 9 (1997), 1483–1492.

    Article  Google Scholar 

  6. Katz, S.: Distribution of content words and phrases in text and language modeling, Natural Language Engineering 2(1) (1996), 15–59.

    Article  Google Scholar 

  7. Kabán, A. and Girolami, M.: Unsupervised topic separation and keyword identification in document collections: A projection approach, Technical Report, 10, University of Paisley.

  8. Lee, D., Seung, S.: Learning the parts of objects by non-negative matrix factorization, Nature 401 (1999), 788–791.

    Article  ADS  Google Scholar 

  9. Papadimitriou, C. H. and Raghavan, P.: Latent semantic indexing: a probabilistic analysis, Proceedings of 17th ACM Symposium on the Principles of Database Systems, 1998, 159–168.

  10. Sahami, M.: Using Machine Learning to Improve Information Access, PhD Thesis, Stanford University, 1998.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kabán, A., Girolami, M.A. Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus. Neural Processing Letters 15, 31–43 (2002). https://doi.org/10.1023/A:1013801028884

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013801028884

Navigation