Kernels for Text Analysis

  • Evgeni Tsivtsivadze
  • Tapio Pahikkala
  • Jorma Boberg
  • Tapio Salakoski
Part of the Studies in Computational Intelligence book series (SCI, volume 116)


During past decade, kernel methods have proved to be successful in different text analysis tasks. There are several reasons that make kernel based methods applicable to many real world problems especially in domains where data is not naturally represented in a vector form. Firstly, instead of manual construction of the feature space for the learning task, kernel functions provide an alternative way to design useful features automatically, therefore, allowing very rich representations. Secondly, kernels can be designed to incorporate a. prior knowledge about the domain. This property allows to notably improve performance of the general learning methods and their simple adaptation to the specific problem. Finally, kernel methods are naturally applicable in situations where data representation is not in a vectorial form, thus avoiding extensive preprocessing step. In this chapter, we present the main ideas behind kernel methods in general and kernels for text analysis in particular as well as provide an example of designing feature space for parse ranking problem with different kernel functions.


Kernel Function Feature Space Kernel Method Link Length Link Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aronszajn, N.: Theory of reproducing kernels. Transactions of the American Mathematical Society 68 (1950)Google Scholar
  2. 2.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge, MA (2001)Google Scholar
  3. 3.
    Herbrich, R.: Learning Kernel Classifiers: Theory and Algorithms. MIT, Cambridge, MA (2002)Google Scholar
  4. 4.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY (2004)Google Scholar
  5. 5.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: European Conference on Machine Learning (ECML), Berlin, Springer (1998) 137–142Google Scholar
  6. 6.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. J. Mach. Learn. Res. 2 (2002) 419–444zbMATHCrossRefGoogle Scholar
  7. 7.
    Cancedda, N., Gaussier, E., Goutte, C., Renders, J.M.: Word sequence kernels. J. Mach. Learn. Res. 3 (2003) 1059–1082zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz (1999)Google Scholar
  9. 9.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In Dietterich, T.G., Becker, S., Ghahramani, Z., eds.: NIPS, MIT, Cambridge, MA (2001) 625–632Google Scholar
  10. 10.
    Gärtner, T., Flach, P.A., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In Schölkopf, B., Warmuth, M.K., eds.: Sixteenth Annual Conference on Computational Learning Theory and Seventh Kernel Workshop (COLT-2003). Volume 2777 of Lecture Notes in Computer Science., Springer (2003) 129–143Google Scholar
  11. 11.
    Pahikkala, T., Tsivtsivadze, E., Boberg, J., Salakoski, T.: Graph kernels versus graph representations: a case study in parse ranking. In Gärtner, T., Garriga, G.C., Meinl, T., eds.: Proceedings of the ECML/PKDD’06 workshop on Mining and Learning with Graphs (MLG’06). (2006)Google Scholar
  12. 12.
    Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. J. Intell. Inf. Syst. 18 (2002) 127–152CrossRefGoogle Scholar
  13. 13.
    Leslie, C., Kuang, R.: Fast string kernels using inexact matching for protein sequences. J. Mach. Learn. Res. 5 (2004) 1435–1455MathSciNetGoogle Scholar
  14. 14.
    Sleator, D.D., Temperley, D.: Parsing english with a link grammar. Technical Report CMU-CS-91-196, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1991)Google Scholar
  15. 15.
    Tsivtsivadze, E., Pahikkala, T., Boberg, J., Salakoski, T.: Locality-convolution kernel and its application to dependency parse ranking. In Ali, M., Dapoigny, R., eds.: IEA/AIE. Volume 4031 of Lecture Notes in Computer Science., Springer (2006) 610–618Google Scholar
  16. 16.
    Gärtner, T.: Exponential and geometric kernels for graphs. In: NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data. (2002)Google Scholar
  17. 17.
    Tsivtsivadze, E., Pahikkala, T., Pyysalo, S., Boberg, J., Mylläri, A., Salakoski, T.: Regularized least-squares for parse ranking. In: Proceedings of the 6th International Symposium on Intelligent Data Analysis, Springer-Verlag (2005) 464–474 Copyright Springer-Verlag Berlin Heidelberg 2005Google Scholar
  18. 18.
    Lafferty, J., Sleator, D., Temperley, D.: Grammatical trigrams: A probabilistic model of link grammar. In: Proceedings of the AAAI Conference on Probabilistic Approaches to Natural Language, Menlo Park, CA, AAAI Press (1992) 89–97Google Scholar
  19. 19.
    Pyysalo, S., Ginter, F., Heimonen, J., Björne, J., Boberg, J., Järvinen, J., Salakoski, T.: BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics (2007) Available at
  20. 20.
    Kendall, M.G.: Rank Correlation Methods. 4 edn. Griffin, London (1970)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Evgeni Tsivtsivadze
    • 1
  • Tapio Pahikkala
    • 1
  • Jorma Boberg
    • 1
  • Tapio Salakoski
    • 1
  1. 1.Turku Centre for Computer Science (TUGS), Department of Information TechnologyUniversity of TurkuTurkuFinland

Personalised recommendations