Abstract
Many applications of natural language processing (NLP) need an accurate resolution of various ambiguities existing in natural language. The task of fulfilling this need is also called word sense disambiguation (WSD). WSD is to resolve the correct sense for an instance of a polysemous word. On the other hand, as one of the most popular machine learning approaches, kernel methods have attracted significant interest in recent years and have exhibited fairly high performance in a wide variety of learning tasks. In this paper, we present a survey of the research progress of kernel-based WSD techniques. We start by introducing some preliminary knowledge concerning WSD and kernel methods. Then, a review of the main approaches in the literature is presented, focusing on the following issues: context representation, kernel design and learning algorithms. We also provide some further discussions on the kernel-based WSD approaches. Finally, open problems and future directions are discussed.
Similar content being viewed by others
Notes
A set has closure under an operation if performance of that operation on members of the set always produces a member of the same set; in this case we also say that the set is closed under the operation.
For a fixed c, we take always the largest context \({\varvec{x}}=(t_{-bl} ,\ldots ,t_{-1} ,t_{1},\ldots ,t_{br})\) so that \(bl\le c\) and \(br\le c\). Note that if there exist c words preceding and following the word to be disambiguated, then \(bl=br=c\), otherwise \(bl<c\) or \(br<c\).
This definition is the so-called gap-weighted subsequences kernel, which is one of the most general types of kernels defined on sequences.
References
Agirre E, Martínez D (2004) The basque country university system: english and basque tasks. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 44–48
Audibert L (2004) Word sense disambiguation criteria: a systematic study. In: Proceedings of 20th international conference on computational linguistics, Geneva, pp 910–916
Beck D (2014) Bayesian kernel methods for natural language processing. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, student research workshop, Baltimore, pp 1–9
Beck D, Cohn T, Specia L (2014) Joint emotion analysis via multi-task Gaussian processes. In: Proceedings of the 2014 conference on empirical methods in natural language processing, Doha, pp 1798–1803
Bhala RV, Abirami S (2014) Trends in word sense disambiguation. Artif Intell Rev 42(2):159–171
Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th conference of the european chapter of the association for computational linguistics, Trento, pp 9–16
Cabezas C, Resnik P, Stevens J (2001) Supervised sense tagging using support vector machines. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 59–62
Cancedda N, Gaussier E, Goutte C, Renders J-M (2003) Word-sequences kernels. J Mach Learn Res 3:1059–1082
Cancedda N, Mahé P (2009) Factored sequence kernels. Neurocomputing 72(7–9):1407–1413
Carpuat M, Su W, Wu D (2004) Augmenting ensemble classification for word sense disambiguation with a kernel PCA model. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 88–92
Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Czech Republic, Prague, pp 61–72
Chan YS, Ng HT, Chiang D (2007a) Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics. Czech Republic, Prague, pp 33–40
Chan YS, Ng HT, Zhong Z (2007b) NUS-PT: Exploiting parallel texts for word sense disambiguation in the english all-words tasks. In: Proceedings of the 4th international workshop on semantic evaluations (Semeval-2007), Czech Republic, Prague, pp 253–256
Cohn T, Specia L (2013) Modelling annotator bias with multi-task Gaussian processes: an application to machine translation quality estimation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Sofia, pp 32–42
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Cristianini N, Shawe-Taylor J, Lodhi H (2002) Latent semantic kernels. J Intell Inf Syst 18(2–3):127–152
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
Escudero G, Màrquez L, Rigau G (2004) TALP system for the English lexical sample task. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 113–116
Gärtner T (2003) A survey of kernels for structured data. ACM SIGKDD Explor Newsl 5(1):49–58
Ginter F, Boberg J, Järvinen J, Salakoski T (2004) New techniques for disambiguation in natural language and their application to biological text. J Mach Learn Res 5:605–621
Giuliano C, Gliozzo A, Strapparava C (2006) Syntagmatic kernels: a word sense disambiguation case study. In: Proceedings of the EACL-2006 workshop on learning structured information in natural language applications, Trento
Giuliano C, Gliozzo A, Strapparava C (2009) Kernel methods for minimally supervised WSD. Comput Linguist 35(4):513–528
Gliozzo A, Giuliano C, Strapparava C (2005) Domain kernels for word sense disambiguation. In: Proceedings of the 43rd annual meeting of the association for computational linguistics, University of Michigan, USA, pp 403–410
Gönen M, Alpayın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Graf ABA, Smola AJ, Borer S (2003) Classification in a normalized feature space using support vector machines. IEEE Trans Neural Netw 14(3):597–605
Grozeaa C (2004) Finding optimal parameter settings for high performance word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 125–128
Hoi SCH, Lyu MR, Chang EY (2006) Learning the unified kernel machines for classification. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, Philadelphia, pp 187–196
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University
Jin P, Li F, Zhu D, Wu Y, Yu S (2008) Exploiting external knowledge sources to improve kernel-based word sense disambiguation. In: Proceedings of IEEE international conference on natural language processing and knowledge engineering, Beijing, pp 1–8
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European conference on machine learning, Chemnitz, pp 137–142
Joshi M, Pedersen T, Maclin R (2005) A comparative study of support vector machines applied to the word sense disambiguation problem for the medical domain. In: Proceedings of the 2nd indian international conference on artificial intelligence, Pune, pp 3449–3468
Joshi M (2006) Kernel methods for word sense disambiguation and abbreviation expansion in the medical domain. Master Thesis, University of Minnesota
Joshi M, Pedersen T, Maclin R, Pakhomov S (2006) Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, Boston
Kandola J, Shawe-Taylor J, Cristianini N (2003) Learning semantic similarity. Adv Neural Inf Process Syst 15:657–664
Lee YK, Ng HT (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the conference on empirical methods in natural language processing, Philadelphia, pp 41–48
Lee YK, Ng HT, Chia TK (2004) Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 137–140
Li XJ, Rao F, Wang TH, Qiu TR (2012) Rough set-based feature weighted kernels for support vector machine. J Comput Theor Nanosci 9(12):2255–2259
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202
Murata M, Utiyama M, Uchimoto K, Ma Q, Isahara H (2001) Japanese word sense disambiguation using the simple Bayes and support vector machine methods. In: Proceedings of the 2nd international workshop on evaluating word sense disambiguation systems (Senseval-2), Toulouse, pp 135–138
Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69
Navigli R, Lapata M (2010) An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Trans Pattern Anal Mach Intell 32(4):678–692
Nguyen KH, Ock CY (2013) Word sense disambiguation as a traveling salesman problem. Artif Intell Rev 40(4):405–427
Pahikkala T, Ginter F, Boberg J, Järvinen J, Salakoski T (2005a) Contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation. BMC Bioinform 6(1):157–168
Pahikkala T, Pyysalo S, Boberg J, Mylläri A, Salakoski T (2005b) Improving the performance of Bayesian and support vector classifiers in word sense disambiguation using positional information. In: Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning, Espoo, pp 90–97
Pahikkala T, Pyysalo S, Ginter F, Boberg J, Järvinen J, Salakoski T (2005c) Kernels incorporating word positional information in natural language disambiguation tasks. In: Proceedings of the 18th international florida artificial intelligence research society conference, Menlo Park, pp 442–447
Pahikkala T, Pyysalo S, Boberg J, Järvinen J, Salakoski T (2009) Matrix representations, linear transformations, and kernels for disambiguation in natural language. Mach Learn 74(2):133–158
Popescu M (2004) Regularized least-squares classification for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 209–212
Preotiuc-Pietro D, Cohn T (2013) A temporal model of text periodicities using Gaussian processes. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, pp 977–988
Preotiuc-Pietro D, Hristea F (2014) Unsupervised word sense disambiguation with N-gram features. Artif Intell Rev 41(2):241–260
Purandare A, Pedersen T (2004) Word sense discrimination by clustering contexts in vector and similarity spaces. In: Proceedings of the 8th conference on computational natural language learning, Boston
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
Salton G, Wang A, Yang C (1975) A vector space model for information retrieval. J Am Soc Inf Sci 18:613–620
Schölkopf B, Smola A, Müller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York
Siolas G, d’Alché-Buc F (2000) Support vector machines based on a semantic kernel for text categorization. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, Como, pp 205–209
Stokoe C, Oakes MP, Tait J (2003) Word sense disambiguation in information retrieval revisited. In: Proceedings of the 26th annual international acm sigir conference on research and development in information retrieval, Toronto, pp 159–166
Strapparava C, Gliozzo A, Giuliano C (2004) Pattern abstraction and term similarity for word sense disambiguation. In: Proceedings of the 3rd international workshop on the evaluation of systems for the semantic analysis of text (Senseval-3), Barcelona, pp 229–234
Su W, Carpuat M, Wu D (2004) Semi-supervised training of a kernel PCA-based model for word sense disambiguation. In: Proceedings of the 20th international conference on computational linguistics, Geneva, pp 1298–1304
Turdakov DY (2010) Word sense disambiguation methods. Program Comput Softw 36(6):309–326
Wang P, Domeniconi C (2008) Building semantic kernels for text classification using Wikipedia. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, pp 713–721
Wang T, Rao J, Zhao D (2013) Using exponential kernel for word sense disambiguation. In: Proceedings of the 23rd international conference on artificial neural networks, LNCS 8131, Sofia, pp 545–552
Wang T, Rao J, Hu Q (2014) Supervised word sense disambiguation using semantic diffusion kernel. Eng Appl Artif Intell 27:167–174
Wang T, Zhao D, Tian S (2015) An overview of kernel alignment and its applications. Artif Intell Rev 43(2):179–192
Wu D, Su W, Carpuat M (2004) A kernel PCA method for superior word sense disambiguation. In: Proceedings of the 42nd annual meeting of the association for computational linguistics, Barcelona, pp 637–644
Yarowsky D, Florian R (2002) Evaluating sense disambiguation across diverse parameter spaces. Nat Lang Eng 8(4):293–310
Zhong Z, Ng HT (2010) It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the ACL system demonstrations, Uppsala, pp 78–83
Zhong Z, Ng HT (2012) Word sense disambiguation improves information retrieval. In: Proceedings of the 50th Annual meeting of the association for computational linguistics, Jeju, pp 273–282
Acknowledgments
The authors would like to thank all the referees for their constructive and insightful comments on this paper. The corresponding author also thanks the financial support of China Scholarship Council (No. 201308360053) as a visiting scholar for doing research with Prof. Peter X. Liu at Carleton University, and thanks for valuable discussions with Prof. Peter X. Liu and Dr. Shichao Liu at Carleton University. This work is supported in part by the National Nature Science Foundation of China (Nos. 51367014, 61202265, 61462040 and 61262049), the Jiangxi Province Natural Science Foundation of China (Nos. 20142BAB207011 and 20142BAB217016), the Jiangxi Province Education Plan of Young Scientists Foundation of China (No. 20112BCB23004), the Jiangxi Province Science and Technology Support Plan Key Projects of China (No. 20111BBE50008), and the Science and Technology Plan Projects in Jiangxi province Education Bureau of China (Nos. GJJ14770 and YC2015-S035).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, X., Qing, S., Zhang, H. et al. Kernel methods for word sense disambiguation. Artif Intell Rev 46, 41–58 (2016). https://doi.org/10.1007/s10462-015-9455-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-015-9455-5