Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

Kabán, A.; Girolami, M. A.

doi:10.1023/A:1013801028884

Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

Published: February 2002

Volume 15, pages 31–43, (2002)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

A. Kabán¹ &
M. A. Girolami¹

138 Accesses
3 Citations
Explore all metrics

Abstract

This paper proposes a projection-based symmetrical factorisation method for extracting semantic features from collections of text documents stored in a Latent Semantic space. Preliminary experimental results demonstrate this yields a comparable representation to that provided by a novel probabilistic approach which reconsiders the entire indexing problem of text documents and works directly in the original high dimensional vector-space representation of text. The employed projection index is derived here from the a priori constraints on the problem. The principal advantage of this approach is computational efficiency and is obtained by the exploitation of the Latent Semantic Indexing as a preprocessing stage. Simulation results on subsets of the 20-Newsgroups text corpus in various settings are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Berry, M. W: Large-scale sparse singular value computations, The International Journal of Super-computer Applications 6(1) (1992), 13–49.
MathSciNet Google Scholar
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41(6) (1990), 391–407.
Article Google Scholar
Kolenda, T., Hansen, L.-L. and Sigurdsson, S.: Independent components in text, In: M. Girolami (ed.), Advances in Independent Component Analysis (Springer-Verlag, 2000) pp. 241–262.
Hofmann, T.: Probabilistic latent semantic analysis, Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI'99), San Francisco,CA, 1999, pp. 289–296.
HyvOrinen, A. and Oja, E.: A fast fixed-point algorithm for independent component analysis, Neural Computation 9 (1997), 1483–1492.
Article Google Scholar
Katz, S.: Distribution of content words and phrases in text and language modeling, Natural Language Engineering 2(1) (1996), 15–59.
Article Google Scholar
Kabán, A. and Girolami, M.: Unsupervised topic separation and keyword identification in document collections: A projection approach, Technical Report, 10, University of Paisley.
Lee, D., Seung, S.: Learning the parts of objects by non-negative matrix factorization, Nature 401 (1999), 788–791.
Article ADS Google Scholar
Papadimitriou, C. H. and Raghavan, P.: Latent semantic indexing: a probabilistic analysis, Proceedings of 17th ACM Symposium on the Principles of Database Systems, 1998, 159–168.
Sahami, M.: Using Machine Learning to Improve Information Access, PhD Thesis, Stanford University, 1998.

Download references

Author information

Authors and Affiliations

Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FIN-02015, HUT, Finland
A. Kabán & M. A. Girolami

Authors

A. Kabán
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Girolami
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kabán, A., Girolami, M.A. Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus. Neural Processing Letters 15, 31–43 (2002). https://doi.org/10.1023/A:1013801028884

Download citation

Issue Date: February 2002
DOI: https://doi.org/10.1023/A:1013801028884

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

Abstract

Access this article

Similar content being viewed by others

Techniques for Processing LSI Queries Incorporating Phrases

Use of the EPSILON Decomposition and the SVD Based LSI Techniques for Reduction of the Large Indexing Structures

Q3-D3-LSA: D3.js and Generalized Vector Space Models for Statistical Computing

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus

Abstract

Access this article

Similar content being viewed by others

Techniques for Processing LSI Queries Incorporating Phrases

Use of the EPSILON Decomposition and the SVD Based LSI Techniques for Reduction of the Large Indexing Structures

Q3-D3-LSA: D3.js and Generalized Vector Space Models for Statistical Computing

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation