Latent Semantic Kernels

  • Nello Cristianini
  • John Shawe-Taylor
  • Huma Lodhi
Article

Abstract

Kernel methods like support vector machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vector-space representation of two documents, in analogy with classical information retrieval (IR) approaches.

Latent semantic indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost.

In this paper we describe how the LSI approach can be implemented in a kernel-defined feature space.

We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.

Kernel methods latent semantic indexing latent semantic kernels Gram-Schmidt kernels text categorization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aizerman, M., Braverman, E., and Rozonoer, L. (1964). Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning. Automation and Remote Control, 25, 821–837.Google Scholar
  2. Boser, B.E., Guyon, I.M., and Vapnik, V.N. (1992). A Training Algorithm for Optimal Margin Classifiers. In D. Haussler (Eds.), Proceedings of the 5th Annual ACMWorkshop on Computational Learning Theory (pp. 144–152). New York: ACM Press.Google Scholar
  3. Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge: Cambridge University Press.Google Scholar
  4. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R.A. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 391–407.Google Scholar
  5. Dumais, S., Platt, J., Heckerman, D., and Sahami, M. (1998). Inductive Learning Algorithms and Representations for Text Categorization. In 7th International Conference on Information and Knowledge Management.Google Scholar
  6. Dumais, S.T., Letsche, T.A., Littman, M.L., and Landauer, T.K. (1997). Automatic Cross-Language Retrieval Using Latent Semantic Indexing. In AAAI Spring Symposuim on Cross-Language Text and Speech Retrieval (pp. 115–132).Google Scholar
  7. Herbrich, R., Graepel, T., and Obermayer, K. (2000). Large Margin Rank Boundaries for Ordinal Regression. In A.J. Smola, P. Bartlett, B. Schölkopf, and C. Schuurmans (Eds.), Advances in Large Margin Classifiers. Cambridge, MA: MIT Press.Google Scholar
  8. Joachims, T. (1998). Text Categorization with Support Vector Machines. In Proceedings of European Conference on Machine Learning (ECML).Google Scholar
  9. Joachims, T. (1999). Making Large-Scale SVM Learning Practical. In B. Schölkopf, C. Burges, and A. Smola (Eds.), Advances in Kernel—Methods Support Vector Learning. Cambridge, MA: MIT Press.Google Scholar
  10. Jiang, F. and Littman, M.L. (2000). Approximate Dimension Equalization in Vector-Based Information Retrieval. In Pat Langley (Ed.), Proceedings of the Seventeenth International Conference on Machine Learning. Los Altos, CA: Morgan-Kauffman.Google Scholar
  11. Leopold, E. and Kinderman, J. (2002). Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? Machine Learning, 46, 423–444.Google Scholar
  12. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. (1993). Five Papers onWordnet. Technical report, Stanford University.Google Scholar
  13. Opper, M. and Winther, O. (2000). Gaussian Processes and SVM: Mean Field and Leave-One-Out. In A.J. Smola, P. Bartlett, B. Schölkopf, and C. Schuurmans (Eds.), Advances in Large Margin Classifiers. Cambridge, MA: MIT Press.Google Scholar
  14. Press, W.H. (1992). Numerical Recipes in C: The Art of Scientific Computing. Cambridge: Cambridge University Press.Google Scholar
  15. Salton, G., Wang, A., and Yang, C.S. (1975). A Vector Space Model for Information Retrieval. Journal of the American Society for Information Science, 18, 613–620.Google Scholar
  16. Saunders, C., Gammermann, A., and Vovk, V. (1998). Ridge Regression Learning Algorithm in Dual Variables. In J. Shavlik (Ed.), Machine Learning: Proceedings of the Fifteenth International Conference. Los Altos, CA: Morgan Kaufmann.Google Scholar
  17. Schölkopf, B., Mika, S., Smola, A., Rôtsch, G., and Müller, K.-R. (1998). Kernel PCA Pattern Reconstruction via Approximate Pre-Images. In L. Niklasson, M. Bodén, and T. Ziemke (Eds.), Proceedings of the 8th International Conference on Artificial Neural Networks, Perspectives in Neural Computing (pp. 147–152). Berlin: Springer Verlag.Google Scholar
  18. Schölkopf, B., Smola, A.J., and Müller, K. (1999). Kernel Principal Component Analysis. In B. Schölkopf, C.J.C. Burges, and A.J. Smola (Eds.), Advances in Kernel Methods—Support Vector Learning (pp. 327–352). Cambridge, MA: MIT Press.Google Scholar
  19. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Williamson, R.C., and Smola, A.J. (2001). SV Estimating the Support of a Higher Dimensional Distribution, Neural Computation. In Neural Information Processing Systems, 13(7), 1443–1471.Google Scholar
  20. Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., and Anthony, M. (1998). Structural Risk Minimization over Data-Dependent Hierarchies. IEEE Transactions on Information Theory, 44(5), 1926–1940.Google Scholar
  21. Shawe-Taylor, J. and Cristianini, N. (2000). Margin Distribution and Soft Margin. In A.J. Smola, P.L. Bartlett, B. Schölkopf, and D. Schuurmans (Eds.), Advances in Large Margin Classifiers (pp. 349–358). Cambridge, MA: MIT Press.Google Scholar
  22. Sheridan, P. and Ballerini, J.P. (1996). Experiments in Multilingual Information Retrieval Using the Spi-der System. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 58–65). New York: ACM.Google Scholar
  23. Siolas, G. and d'AlchéBuc, F. (2000). Support Vectors Machines Based on a Semantic Kernel for Text Categorization. In Proceedings of the International Joint Conference on Neural Networks, IJCNN, Como, IEEE.Google Scholar
  24. Smola, A. and Schölkopf, B. (1998). A Tutorial on Support Vector Regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway, University of London.Google Scholar
  25. Smola, A.J., Mangasarian, O.L., and Schölkopf, B. (1999). Sparse Kernel Feature Analysis. Technical Report 99–04, University of Wisconsin, Data Mining Institute, Madison.Google Scholar
  26. Vapnik, V. (1998). Statistical Learning Theory. New York: Wiley.Google Scholar
  27. Wong, S.K.M., Ziarko, W., and Wong, P.C.N. (1985). Generalized Vector Space Model in Information Retrieval. In ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 18–25). New York: ACM.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Nello Cristianini
    • 1
  • John Shawe-Taylor
    • 1
  • Huma Lodhi
    • 1
  1. 1.Department of Computer Science, Royal HollowayUniversity of LondonEgham, SurreyUK

Personalised recommendations