Artificial Intelligence Review

, Volume 46, Issue 1, pp 41–58 | Cite as

Kernel methods for word sense disambiguation

  • Xiangjun Li
  • Song Qing
  • Huawei Zhang
  • Tinghua Wang
  • Huping Yang
Article

Abstract

Many applications of natural language processing (NLP) need an accurate resolution of various ambiguities existing in natural language. The task of fulfilling this need is also called word sense disambiguation (WSD). WSD is to resolve the correct sense for an instance of a polysemous word. On the other hand, as one of the most popular machine learning approaches, kernel methods have attracted significant interest in recent years and have exhibited fairly high performance in a wide variety of learning tasks. In this paper, we present a survey of the research progress of kernel-based WSD techniques. We start by introducing some preliminary knowledge concerning WSD and kernel methods. Then, a review of the main approaches in the literature is presented, focusing on the following issues: context representation, kernel design and learning algorithms. We also provide some further discussions on the kernel-based WSD approaches. Finally, open problems and future directions are discussed.

Keywords

Word sense disambiguation (WSD) Lexical ambiguity Kernel method Classification Natural language processing (NLP) 

Notes

Acknowledgments

The authors would like to thank all the referees for their constructive and insightful comments on this paper. The corresponding author also thanks the financial support of China Scholarship Council (No. 201308360053) as a visiting scholar for doing research with Prof. Peter X. Liu at Carleton University, and thanks for valuable discussions with Prof. Peter X. Liu and Dr. Shichao Liu at Carleton University. This work is supported in part by the National Nature Science Foundation of China (Nos. 51367014, 61202265, 61462040 and 61262049), the Jiangxi Province Natural Science Foundation of China (Nos. 20142BAB207011 and 20142BAB217016), the Jiangxi Province Education Plan of Young Scientists Foundation of China (No. 20112BCB23004), the Jiangxi Province Science and Technology Support Plan Key Projects of China (No. 20111BBE50008), and the Science and Technology Plan Projects in Jiangxi province Education Bureau of China (Nos. GJJ14770 and YC2015-S035).

References

  1. Agirre E, Martínez D (2004) The basque country university system: english and basque tasks. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 44–48Google Scholar
  2. Audibert L (2004) Word sense disambiguation criteria: a systematic study. In: Proceedings of 20th international conference on computational linguistics, Geneva, pp 910–916Google Scholar
  3. Beck D (2014) Bayesian kernel methods for natural language processing. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, student research workshop, Baltimore, pp 1–9Google Scholar
  4. Beck D, Cohn T, Specia L (2014) Joint emotion analysis via multi-task Gaussian processes. In: Proceedings of the 2014 conference on empirical methods in natural language processing, Doha, pp 1798–1803Google Scholar
  5. Bhala RV, Abirami S (2014) Trends in word sense disambiguation. Artif Intell Rev 42(2):159–171CrossRefGoogle Scholar
  6. Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th conference of the european chapter of the association for computational linguistics, Trento, pp 9–16Google Scholar
  7. Cabezas C, Resnik P, Stevens J (2001) Supervised sense tagging using support vector machines. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 59–62Google Scholar
  8. Cancedda N, Gaussier E, Goutte C, Renders J-M (2003) Word-sequences kernels. J Mach Learn Res 3:1059–1082MathSciNetMATHGoogle Scholar
  9. Cancedda N, Mahé P (2009) Factored sequence kernels. Neurocomputing 72(7–9):1407–1413CrossRefGoogle Scholar
  10. Carpuat M, Su W, Wu D (2004) Augmenting ensemble classification for word sense disambiguation with a kernel PCA model. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 88–92Google Scholar
  11. Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Czech Republic, Prague, pp 61–72Google Scholar
  12. Chan YS, Ng HT, Chiang D (2007a) Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics. Czech Republic, Prague, pp 33–40Google Scholar
  13. Chan YS, Ng HT, Zhong Z (2007b) NUS-PT: Exploiting parallel texts for word sense disambiguation in the english all-words tasks. In: Proceedings of the 4th international workshop on semantic evaluations (Semeval-2007), Czech Republic, Prague, pp 253–256Google Scholar
  14. Cohn T, Specia L (2013) Modelling annotator bias with multi-task Gaussian processes: an application to machine translation quality estimation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Sofia, pp 32–42Google Scholar
  15. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATHGoogle Scholar
  16. Cristianini N, Shawe-Taylor J, Lodhi H (2002) Latent semantic kernels. J Intell Inf Syst 18(2–3):127–152CrossRefGoogle Scholar
  17. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRefGoogle Scholar
  18. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New YorkMATHGoogle Scholar
  19. Escudero G, Màrquez L, Rigau G (2004) TALP system for the English lexical sample task. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 113–116Google Scholar
  20. Gärtner T (2003) A survey of kernels for structured data. ACM SIGKDD Explor Newsl 5(1):49–58CrossRefGoogle Scholar
  21. Ginter F, Boberg J, Järvinen J, Salakoski T (2004) New techniques for disambiguation in natural language and their application to biological text. J Mach Learn Res 5:605–621MathSciNetGoogle Scholar
  22. Giuliano C, Gliozzo A, Strapparava C (2006) Syntagmatic kernels: a word sense disambiguation case study. In: Proceedings of the EACL-2006 workshop on learning structured information in natural language applications, TrentoGoogle Scholar
  23. Giuliano C, Gliozzo A, Strapparava C (2009) Kernel methods for minimally supervised WSD. Comput Linguist 35(4):513–528CrossRefGoogle Scholar
  24. Gliozzo A, Giuliano C, Strapparava C (2005) Domain kernels for word sense disambiguation. In: Proceedings of the 43rd annual meeting of the association for computational linguistics, University of Michigan, USA, pp 403–410Google Scholar
  25. Gönen M, Alpayın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268MathSciNetMATHGoogle Scholar
  26. Graf ABA, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605CrossRefGoogle Scholar
  27. Grozeaa C (2004) Finding optimal parameter settings for high performance word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 125–128Google Scholar
  28. Hoi SCH, Lyu MR, Chang EY (2006) Learning the unified kernel machines for classification. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, pp 187–196Google Scholar
  29. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425CrossRefGoogle Scholar
  30. Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan UniversityGoogle Scholar
  31. Jin P, Li F, Zhu D, Wu Y, Yu S (2008) Exploiting external knowledge sources to improve kernel-based word sense disambiguation. In: Proceedings of IEEE international conference on natural language processing and knowledge engineering, Beijing, pp 1–8Google Scholar
  32. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, Chemnitz, pp 137–142Google Scholar
  33. Joshi M, Pedersen T, Maclin R (2005) A comparative study of support vector machines applied to the word sense disambiguation problem for the medical domain. In: Proceedings of the 2nd indian international conference on artificial intelligence, Pune, pp 3449–3468Google Scholar
  34. Joshi M (2006) Kernel methods for word sense disambiguation and abbreviation expansion in the medical domain. Master Thesis, University of MinnesotaGoogle Scholar
  35. Joshi M, Pedersen T, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, BostonGoogle Scholar
  36. Kandola J, Shawe-Taylor J, Cristianini N (2003) Learning semantic similarity. Adv Neural Inf Process Syst 15:657–664Google Scholar
  37. Lee YK, Ng HT (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the conference on empirical methods in natural language processing, Philadelphia, pp 41–48Google Scholar
  38. Lee YK, Ng HT, Chia TK (2004) Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 137–140Google Scholar
  39. Li XJ, Rao F, Wang TH, Qiu TR (2012) Rough set-based feature weighted kernels for support vector machine. J Comput Theor Nanosci 9(12):2255–2259CrossRefGoogle Scholar
  40. Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444MATHGoogle Scholar
  41. Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202CrossRefGoogle Scholar
  42. Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H (2001) Japanese word sense disambiguation using the simple Bayes and support vector machine methods. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 135–138Google Scholar
  43. Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69CrossRefGoogle Scholar
  44. Navigli R, Lapata M (2010) An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans Pattern Anal Mach Intell 32(4):678–692CrossRefGoogle Scholar
  45. Nguyen KH, Ock CY (2013) Word sense disambiguation as a traveling salesman problem. Artif Intell Rev 40(4):405–427CrossRefGoogle Scholar
  46. Pahikkala T, Ginter F, Boberg J, Järvinen J, Salakoski T (2005a) Contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation. BMC Bioinform 6(1):157–168CrossRefGoogle Scholar
  47. Pahikkala T, Pyysalo S, Boberg J, Mylläri A, Salakoski T (2005b) Improving the performance of Bayesian and support vector classifiers in word sense disambiguation using positional information. In: Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning, Espoo, pp 90–97Google Scholar
  48. Pahikkala T, Pyysalo S, Ginter F, Boberg J, Järvinen J, Salakoski T (2005c) Kernels incorporating word positional information in natural language disambiguation tasks. In: Proceedings of the 18th international florida artificial intelligence research society conference, Menlo Park, pp 442–447Google Scholar
  49. Pahikkala T, Pyysalo S, Boberg J, Järvinen J, Salakoski T (2009) Matrix representations, linear transformations, and kernels for disambiguation in natural language. Mach Learn 74(2):133–158CrossRefMATHGoogle Scholar
  50. Popescu M (2004) Regularized least-squares classification for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 209–212Google Scholar
  51. Preotiuc-Pietro D, Cohn T (2013) A temporal model of text periodicities using Gaussian processes. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, pp 977–988Google Scholar
  52. Preotiuc-Pietro D, Hristea F (2014) Unsupervised word sense disambiguation with N-gram features. Artif Intell Rev 41(2):241–260CrossRefGoogle Scholar
  53. Purandare A, Pedersen T (2004) Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the 8th conference on computational natural language learning, BostonGoogle Scholar
  54. Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141MathSciNetMATHGoogle Scholar
  55. Salton G, Wang A, Yang C (1975) A vector space model for information retrieval. J Am Soc Inf Sci 18:613–620MATHGoogle Scholar
  56. Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319CrossRefGoogle Scholar
  57. Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New YorkCrossRefMATHGoogle Scholar
  58. Siolas G, d’Alché-Buc F (2000) Support vector machines based on a semantic kernel for text categorization. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, Como, pp 205–209Google Scholar
  59. Stokoe C, Oakes MP, Tait J (2003) Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th annual international acm sigir conference on research and development in information retrieval, Toronto, pp 159–166Google Scholar
  60. Strapparava C, Gliozzo A, Giuliano C (2004) Pattern abstraction and term similarity for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 229–234Google Scholar
  61. Su W, Carpuat M, Wu D (2004) Semi-supervised training of a kernel PCA-based model for word sense disambiguation. In: Proceedings of the 20th international conference on computational linguistics, Geneva, pp 1298–1304Google Scholar
  62. Turdakov DY (2010) Word sense disambiguation methods. Program Comput Softw 36(6):309–326MathSciNetCrossRefMATHGoogle Scholar
  63. Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, pp 713–721Google Scholar
  64. Wang T, Rao J, Zhao D (2013) Using exponential kernel for word sense disambiguation. In: Proceedings of the 23rd international conference on artificial neural networks, LNCS 8131, Sofia, pp 545–552Google Scholar
  65. Wang T, Rao J, Hu Q (2014) Supervised word sense disambiguation using semantic diffusion kernel. Eng Appl Artif Intell 27:167–174CrossRefGoogle Scholar
  66. Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192CrossRefGoogle Scholar
  67. Wu D, Su W, Carpuat M (2004) A kernel PCA method for superior word sense disambiguation. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, Barcelona, pp 637–644Google Scholar
  68. Yarowsky D, Florian R (2002) Evaluating sense disambiguation across diverse parameter spaces. Nat Lang Eng 8(4):293–310CrossRefGoogle Scholar
  69. Zhong Z, Ng HT (2010) It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL system demonstrations, Uppsala, pp 78–83Google Scholar
  70. Zhong Z, Ng HT (2012) Word sense disambiguation improves information retrieval. In: Proceedings of the 50th Annual meeting of the association for computational linguistics, Jeju, pp 273–282Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  • Xiangjun Li
    • 1
    • 2
  • Song Qing
    • 1
  • Huawei Zhang
    • 1
  • Tinghua Wang
    • 3
  • Huping Yang
    • 4
  1. 1.Department of Computer Science and TechnologyNanchang UniversityNanchangPeople’s Republic of China
  2. 2.Department of Systems and Computer EngineeringCarleton UniversityOttawaCanada
  3. 3.School of Mathematics and Computer ScienceGannan Normal UniversityGanzhouPeople’s Republic of China
  4. 4.Department of Electrical and Automation EngineeringNanchang UniversityNanchangPeople’s Republic of China

Personalised recommendations