Applied Intelligence

, 31:81 | Cite as

Locality kernels for sequential data and their applications to parse ranking

  • Evgeni Tsivtsivadze
  • Tapio Pahikkala
  • Jorma Boberg
  • Tapio Salakoski
Article

Abstract

We propose a framework for constructing kernels that take advantage of local correlations in sequential data. The kernels designed using the proposed framework measure parse similarities locally, within a small window constructed around each matching feature. Furthermore, we propose to incorporate positional information inside the window and consider different ways to do this. We applied the kernels together with regularized least-squares (RLS) algorithm to the task of dependency parse ranking using the dataset containing parses obtained from a manually annotated biomedical corpus of 1100 sentences. Our experiments show that RLS with kernels incorporating positional information perform better than RLS with the baseline kernel functions. This performance gain is statistically significant.

Keywords

Kernel methods Parse ranking Regularized least-squares Natural language processing 

References

  1. 1.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York Google Scholar
  2. 2.
    Scholkopf B, Smola AJ (2001) Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge Google Scholar
  3. 3.
    Herbrich R (2002) Learning kernel classifiers: theory and algorithms. MIT Press, Cambridge Google Scholar
  4. 4.
    Collins M, Duffy N (2001) Convolution kernels for natural language. In: Dietterich TG, Becker S, Ghahramani Z (eds) NIPS. MIT Press, Cambridge, pp 625–632 Google Scholar
  5. 5.
    Sleator DD, Temperley D (1991) Parsing English with a link grammar. Technical Report CMU-CS-91-196, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA Google Scholar
  6. 6.
    Pyysalo S, Ginter F, Heimonen J, Björne J, Boberg J, Järvinen J, Salakoski T (2007) BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics 8:50. The corpus is available at http://www.it.utu.fi/BioInfer CrossRefGoogle Scholar
  7. 7.
    Pyysalo S, Ginter F, Pahikkala T, Boberg J, Järvinen J, Salakoski T, Koivula J (2004) Analysis of link grammar on biomedical dependency corpus targeted at protein-protein interactions. In: Collier N, Ruch P, Nazarenko A (eds) Proceedings of the JNLPBA workshop at COLING’04, Geneva, 2004, pp 15–21 Google Scholar
  8. 8.
    Tsivtsivadze E, Pahikkala T, Pyysalo S, Boberg J, Mylläri A, Salakoski T (2005) Regularized least-squares for parse ranking. In: Proceedings of the 6th international symposium on intelligent data analysis. Springer, Berlin, pp 464–474 Google Scholar
  9. 9.
    Poggio T, Smale S (2003) The mathematics of learning: Dealing with data. Am Math Soc Not 50:537–544 MATHMathSciNetGoogle Scholar
  10. 10.
    Rifkin R (2002) Everything old is new again: A fresh look at historical approaches in machine learning. PhD thesis, MIT Google Scholar
  11. 11.
    Pahikkala T, Boberg J, Salakoski T (2006) Fast n-fold cross-validation for regularized least-squares. In: Honkela T, Raiko T, Kortela J, Valpola H (eds) Proceedings of the 9th Scandinavian conference on artificial intelligence (SCAI 2006), Espoo, Finland, Otamedia Oy, pp 83–90 Google Scholar
  12. 12.
    Tsivtsivadze E, Pahikkala T, Boberg J, Salakoski T (2006) Locality-convolution kernel and its application to dependency parse ranking. In: Ali M, Dapoigny R (eds) IEA/AIE. Lecture notes in computer science, vol 4031. Springer, Berlin, pp 610–618 Google Scholar
  13. 13.
    Zien A, Ratsch G, Mika S, Scholkopf B, Lengauer T, Muller KR (2000) Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16:799–807 CrossRefGoogle Scholar
  14. 14.
    Leslie CS, Eskin E, Noble WS (2002) The spectrum kernel: A string kernel for svm protein classification. In: Pacific symposium on biocomputing, pp 566–575 Google Scholar
  15. 15.
    Kendall MG (1970) Rank correlation methods, 4th edn. Griffin, London MATHGoogle Scholar
  16. 16.
    Haussler D (1999) Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, UC Santa Cruz Google Scholar
  17. 17.
    Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In: Helmbold D, Williamson R (eds) Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory. Springer, Berlin, pp 416–426 Google Scholar
  18. 18.
    Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins CJCH (2002) Text classification using string kernels. J Mach Learn Res 2:419–444 MATHCrossRefGoogle Scholar
  19. 19.
    Cancedda N, Gaussier E, Goutte C, Renders JM (2003) Word-sequence kernels. J Mach Learn Res 3:1059–1082 MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Moschitti A (2006) Making tree kernels practical for natural language learning. In: 11st Conference of the European chapter of the association for computational linguistics. The Association for Computer Linguistics Google Scholar
  21. 21.
    Gärtner T, Flach PA, Wrobel S (2003) On graph kernels: Hardness results and efficient alternatives. In: Schölkopf B, Warmuth MK (eds) 16th annual conference on computational learning theory and 7th kernel workshop (COLT-2003). Lecture notes in computer science, vol 2777. Springer, Berlin, pp 129–143 Google Scholar
  22. 22.
    Suzuki J, Isozaki H, Maeda E (2004) Convolution kernels with feature selection for natural language processing tasks. In: ACL, pp 119–126 Google Scholar
  23. 23.
    Pahikkala T, Tsivtsivadze E, Boberg J, Salakoski T (2006) Graph kernels versus graph representations: a case study in parse ranking. In: Gärtner T, Garriga GC, Meinl T (eds) Proceedings of the ECML/PKDD’06 workshop on mining and learning with graphs (MLG’06) Google Scholar
  24. 24.
    Pahikkala T, Pyysalo S, Ginter F, Boberg J, Järvinen J, Salakoski T (2005) Kernels incorporating word positional information in natural language disambiguation tasks. In: Russell I, Markov Z (eds) Proceedings of the 18th international Florida artificial intelligence research society conference, Menlo Park, CA. AAAI Press, Menlo Park, pp 442–447 Google Scholar
  25. 25.
    Pahikkala T, Pyysalo S, Boberg J, Mylläri A, Salakoski T (2005) Improving the performance of Bayesian and support vector classifiers in word sense disambiguation using positional information. In: Honkela T, Könönen V, Pöllä M, Simula O. (eds) Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning, Espoo, Finland, Helsinki University of Technology, pp 90–97 Google Scholar
  26. 26.
    Pahikkala T, Boberg J, Mylläri A, Salakoski T (2006) Incorporating external information in Bayesian classifiers via linear feature transformations. In: Salakoski T, Ginter F, Pyysalo S, Pahikkala T (eds) Proceedings of the 5th international conference on natural language processing FinTAL 06, Turku, Finland. Lecture notes in artificial intelligence, vol 4139. Springer, Heidelberg, pp 399–410 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Evgeni Tsivtsivadze
    • 1
  • Tapio Pahikkala
    • 1
  • Jorma Boberg
    • 1
  • Tapio Salakoski
    • 1
  1. 1.Turku Centre for Computer Science (TUCS), Department of Information TechnologyUniversity of TurkuTurkuFinland

Personalised recommendations