Kernel methods like support vector machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representation of two documents, in analogy with classical information retrieval (IR) approaches.
Latent semantic indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost.
In this paper we describe how the LSI approach can be implemented in a kernel-defined feature space.
We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.
Unable to display preview. Download preview PDF.