Matrix representations, linear transformations, and kernels for disambiguation in natural language

Pahikkala, Tapio; Pyysalo, Sampo; Boberg, Jorma; Järvinen, Jouni; Salakoski, Tapio

doi:10.1007/s10994-008-5082-6

Matrix representations, linear transformations, and kernels for disambiguation in natural language

Published: 10 September 2008

Volume 74, pages 133–158, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Matrix representations, linear transformations, and kernels for disambiguation in natural language

Download PDF

Tapio Pahikkala¹,
Sampo Pyysalo¹,
Jorma Boberg¹,
Jouni Järvinen¹ &
…
Tapio Salakoski¹

648 Accesses
12 Citations
Explore all metrics

Abstract

In the application of machine learning methods with natural language inputs, the words and their positions in the input text are some of the most important features. In this article, we introduce a framework based on a word-position matrix representation of text, linear feature transformations of the word-position matrices, and kernel functions constructed from the transformations. We consider two categories of transformations, one based on word similarities and the second on their positions, which can be applied simultaneously in the framework in an elegant way. We show how word and positional similarities obtained by applying previously proposed techniques, such as latent semantic analysis, can be incorporated as transformations in the framework. We also introduce novel ways to determine word and positional similarities. We further present efficient algorithms for computing kernel functions incorporating the transformations on the word-position matrices, and, more importantly, introduce a highly efficient method for prediction. The framework is particularly suitable to natural language disambiguation tasks where the aim is to select for a single word a particular property from a set of candidates based on the context of the word. We demonstrate the applicability of the framework to this type of tasks using context-sensitive spelling error correction on the Reuters News corpus as a model problem.

Article PDF

Near-term advances in quantum natural language processing

Article 11 April 2024

Longest Common Substring with Approximately k Mismatches

Article Open access 16 February 2019

What an Algorithm Is

Article 11 January 2015

References

Bentivogli, L., Forner, P., Magnini, B., & Pianta, E. (2004). Revising wordnet domains hierarchy: Semantics, coverage, and balancing. In G. Sérasset, S. Armstrong, C. Boitet, A. Popescu-Belis, & D. Tufis (Eds.), COLING 2004 workshop on multilingual linguistic resources (pp. 101–108), Geneva, Switzerland.
Cancedda, N., Gaussier, E., Goutte, C., & Renders, J.-M. (2003). Word-sequence kernels. Journal of Machine Learning Research, 3, 1059–1082.
Article MATH MathSciNet Google Scholar
Carreras, X., & Màrques, L. (2004). Introduction to the conll-2004 shared task: semantic role labeling. In Proceedings of CoNLL-2004 (pp. 89–97). Boston: Association for Computational Linguistics.
Google Scholar
Carreras, X., & Màrquez, L. (2005). Introduction to the CoNLL-2005 shared task: semantic role labeling. In Proceedings of the ninth conference on computational natural language learning (CoNLL-2005) (pp. 152–164). Ann Arbor: Association for Computational Linguistics.
Google Scholar
Collins, M., & Duffy, N. (2001). Convolution kernels for natural language.
Cortes, C., & Mohri, M. (2004). Auc optimization vs. error rate minimization. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems 16. Cambridge: MIT Press.
Google Scholar
Cristianini, N., Shawe-Taylor, J., & Lodhi, H. (2002). Latent semantic kernels. Journal of Intelligent Information Systems, 18(2–3), 127–152.
Article Google Scholar
Cumby, C. M., & Roth, D. (2002). Learning with feature description logics. In Proceedings of the 12th international conference on inductive logic programming.
Cumby, C. M., & Roth, D. (2003a). Feature extraction languages for propositionalized relational learning. In Proceedings of the IJCAI’03 workshop on learning statistical models from relational data.
Cumby, C. M., & Roth, D. (2003b). On kernel methods for relational learning. In T. Fawcett & N. Mishra (Eds.), Proceedings of the twentieth international conference on machine learning (pp. 107–114). Menlo Park: AAAI Press.
Google Scholar
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6), 391–407.
Article Google Scholar
Fawcett, T. (2003). Roc graphs: notes and practical considerations for data mining researchers (Technical Report HPL-2003-4). HP Labs, Palo Alto, CA.
Gärtner, T., Flach, P. A., & Wrobel, S. (2003). On graph kernels: hardness results and efficient alternatives. In COLT (pp. 129–143).
Gärtner, T., Lloyd, J. W., & Flach, P. A. (2004). Kernels and distances for structured data. Machine Learning, 57(3), 205–232.
Article MATH Google Scholar
Ginter, F., Boberg, J., Järvinen, J., & Salakoski, T. (2004). New techniques for disambiguation in natural language and their application to biological text. Journal of Machine Learning Research, 5, 605–621.
Google Scholar
Gliozzo, A., Giuliano, C., & Strapparava, C. (2005). Domain kernels for word sense disambiguation. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05) (pp. 403–410). Ann Arbor: Association for Computational Linguistics.
Google Scholar
Golding, A. R., & Roth, D. (1999). A winnow-based approach to context-sensitive spelling correction. Machine Learning, 34, 107–130.
Article MATH Google Scholar
Haussler, D. (1999). Convolution kernels on discrete structures (Technical Report UCS-CRL-99-10). University of California at Santa Cruz.
Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features. In C. Nédellec & C. Rouveirol (Eds.), Lecture notes in computer science : Vol. 1398. Proceedings of the tenth European conference on machine learning (pp. 137–142), Chemnitz, Germany, 1998. Heidelberg: Springer.
Google Scholar
Joachims, T. (2002). Kluwer international series in engineering and computer science: Vol. 668. Learning to classify text using support vector machines: methods, theory and algorithms. Norwell: Kluwer Academic.
Google Scholar
Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River: Prentice Hall PTR.
Google Scholar
Kandola, J., Shawe-Taylor, J., & Cristianini, N. (2003). Learning semantic similarity. In S. T. Becker & K. Obermayer (Eds.), Advances in neural information processing systems 15 (pp. 657–664). Cambridge: MIT Press.
Google Scholar
Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space? Machine Learning, 46(1–3), 423–444.
Article MATH Google Scholar
Ling, C. X., Huang, J., & Zhang, H. (2003). Auc: a statistically consistent and more discriminating measure than accuracy. In G. Gottlob & T. Walsh (Eds.), Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 519–526). San Mateo: Morgan Kaufmann.
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.
Article MATH Google Scholar
Magnini, B., & Cavaglià, G. (2000). Integrating subject field codes into WordNet. In Second international conference on language resources and evaluation (LREC-2000) (pp. 1413–1418). Athens: European Language Resources Association.
Google Scholar
Magnus, J. R. (1988). Linear structures. London: Griffin.
MATH Google Scholar
Meyer, C. D. (2000). Matrix analysis and applied linear algebra. Philadelphia: Society for Industrial and Applied Mathematics.
MATH Google Scholar
Pahikkala, T., Ginter, F., Boberg, J., Järvinen, J., & Salakoski, T. (2005a). Contextual weighting for support vector machines in literature mining: an application to gene versus protein name disambiguation. BMC Bioinformatics, 6(1), 157.
Article Google Scholar
Pahikkala, T., Pyysalo, S., Boberg, J., Mylläri, A., & Salakoski, T. (2005b). Improving the performance of Bayesian and support vector classifiers in word sense disambiguation using positional information. In T. Honkela, V. Könönen, M. Pöllä, & O. Simula (Eds.), Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning (pp. 90–97). Espoo: Helsinki University of Technology.
Google Scholar
Pahikkala, T., Pyysalo, S., Ginter, F., Boberg, J., Järvinen, J., & Salakoski, T. (2005c). Kernels incorporating word positional information in natural language disambiguation tasks. In I. Russell & Z. Markov (Eds.), Proceedings of the eighteenth international Florida artificial intelligence research society conference (pp. 442–447), Clearwater Beach, FL. Menlo Park: AAAI Press.
Google Scholar
Pahikkala, T., Boberg, J., Mylläri, A., & Salakoski, T. (2006a). Incorporating external information in Bayesian classifiers via linear feature transformations. In T. Salakoski, F. Ginter, S. Pyysalo, & T. Pahikkala (Eds.), Lecture notes in computer science: Vol. 4139. Proceedings of the 5th international conference on NLP (FinTAL 2006) (pp. 399–410). Heidelberg: Springer.
Google Scholar
Pahikkala, T., Boberg, J., & Salakoski, T. (2006b). Fast n-fold cross-validation for regularized least-squares. In T. Honkela, T. Raiko, J. Kortela, & H. Valpola (Eds.), Proceedings of the ninth Scandinavian conference on artificial intelligence (SCAI 2006) (pp. 83–90). Espoo: Otamedia Oy.
Google Scholar
Provost, F. J., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In ICML ’98: proceedings of the fifteenth international conference on machine learning (pp. 445–453). San Francisco: Morgan Kaufmann.
Google Scholar
Rifkin, R. (2002). Everything old is new again: a fresh look at historical approaches in machine learning. PhD thesis, MIT.
Rifkin, R., Yeo, G., & Poggio, T. (2003). Regularized least-squares classification. In J. Suykens, G. Horvath, S. Basu, C. Micchelli, & J. Vandewalle (Eds.), NATO science series III: computer and system sciences: Vol. 190. Advances in learning theory: methods, model and applications (pp. 131–154). Amsterdam: IOS Press.
Google Scholar
Rose, T. G., Stevenson, M., & Whitehead, M. (2002). The Reuters corpus volume 1: from yesterday’s news to tomorrow’s language resources. In M. G. Rodriguez & C. P. S. Araujo (Eds.), Proceedings of the third international conference on language resources and evaluation. Paris: ELRA.
Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Schölkopf, B., Simard, P., Smola, A., & Vapnik, V. (1998). Prior knowledge in support vector kernels. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems 10 (pp. 640–646). Cambridge: MIT Press.
Google Scholar
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.
Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall.
MATH Google Scholar
Siolas, G., & d’Alché-Buc, F. (2000). Support vector machines based on a semantic kernel for text categorization. In S.-I. Amari, C. L. Giles, M. Gori, & V. Piuri (Eds.), Proceedings of the IEEE-Inns-Enns international joint conference on neural networks (pp. 205–209), Como, Italy. Washington: IEEE Computer Society.
Google Scholar
Suzuki, J., Hirao, T., Sasaki, Y., & Maeda, E. (2003). Hierarchical directed acyclic graph kernel: methods for structured natural language data.
Tjong, K. S. E. F., & De Meulder, F. (2003). Introduction to the conll-2003 shared task: language-independent named entity recognition. In W. Daelemans & M. Osborne (Eds.), Proceedings of CoNLL-2003 (pp. 142–147). Edmonton: Association for Computational Linguistics.
Google Scholar
Tsivtsivadze, E., Pahikkala, T., Boberg, J., & Salakoski, T. (2006). Locality-convolution kernel and its application to dependency parse ranking. In The 19th international conference on industrial, engineering & other applications of applied intelligent systems. Forthcoming.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Vishwanathan, S., Smola, A. J., & Vidal, R. (2006, to appear). Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. International Journal of Computer Vision.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80–83.
Article Google Scholar
Wong, S. K. M., Ziarko, W., & Wong, P. C. N. (1985). Generalized vector space model in information retrieval. In ACM SIGIR international conference on research and development in information retrieval (pp. 18–25).
Yarowsky, D. (1993). One sense per collocation. In Proceedings, ARPA human language technology workshop, Princeton.
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Meeting of the association for computational linguistics (pp. 189–196).

Download references

Author information

Authors and Affiliations

University of Turku and Turku Centre for Computer Science (TUCS), 20014, Turku, Finland
Tapio Pahikkala, Sampo Pyysalo, Jorma Boberg, Jouni Järvinen & Tapio Salakoski

Authors

Tapio Pahikkala
View author publications
You can also search for this author in PubMed Google Scholar
Sampo Pyysalo
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Boberg
View author publications
You can also search for this author in PubMed Google Scholar
Jouni Järvinen
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Salakoski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tapio Pahikkala.

Additional information

Editor: Dan Roth.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pahikkala, T., Pyysalo, S., Boberg, J. et al. Matrix representations, linear transformations, and kernels for disambiguation in natural language. Mach Learn 74, 133–158 (2009). https://doi.org/10.1007/s10994-008-5082-6

Download citation

Received: 22 December 2006
Revised: 14 June 2007
Accepted: 07 August 2008
Published: 10 September 2008
Issue Date: February 2009
DOI: https://doi.org/10.1007/s10994-008-5082-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Matrix representations, linear transformations, and kernels for disambiguation in natural language

Abstract

Article PDF

Similar content being viewed by others

Near-term advances in quantum natural language processing

Longest Common Substring with Approximately k Mismatches

What an Algorithm Is

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Matrix representations, linear transformations, and kernels for disambiguation in natural language

Abstract

Article PDF

Similar content being viewed by others

Near-term advances in quantum natural language processing

Longest Common Substring with Approximately k Mismatches

What an Algorithm Is

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation