Effective Kernelized Online Learning in Language Processing Tasks

  • Simone Filice
  • Giuseppe Castellucci
  • Danilo Croce
  • Roberto Basili
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)


Kernel-based methods for NLP tasks have been shown to enable robust and effective learning, although their inherent complexity is manifest also in Online Learning (OL) scenarios, where time and memory usage grows along with the arrival of new examples. A state-of-the-art budgeted OL algorithm is here extended to efficiently integrate complex kernels by constraining the overall complexity. Principles of Fairness and Weight Adjustment are applied to mitigate imbalance in data and improve the model stability. Results in Sentiment Analysis in Twitter and Question Classification show that performances very close to the state-of-the-art achieved by batch algorithms can be obtained.


Support Vector Online Learn Sentiment Analysis Kernel Computation Weight Adjustment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of LASM, pp. 30–38 (2011)Google Scholar
  2. 2.
    Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: a collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation 43(3), 209–226 (2009)CrossRefGoogle Scholar
  3. 3.
    Basili, R., Zanzotto, F.M.: Parsing engineering and empirical robustness. Nat. Lang. Eng. 8(3), 97–120 (2002)Google Scholar
  4. 4.
    Cesa-Bianchi, N., Gentile, C.: Tracking the best hyperplane with a simple budget perceptron. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 483–498. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of Neural Information Processing Systems (NIPS 2001), pp. 625–632 (2001)Google Scholar
  6. 6.
    Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 551–585 (2006)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. J. Intell. Inf. Syst. 18(2-3), 127–152 (2002)CrossRefGoogle Scholar
  8. 8.
    Croce, D., Moschitti, A., Basili, R.: Structured lexical similarity via convolution kernels on dependency trees. In: Proceedings of EMNLP, Scotland, UK (2011)Google Scholar
  9. 9.
    Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: COLING, pp. 241–249 (2010)Google Scholar
  10. 10.
    Dekel, O., Shalev-Shwartz, S., Singer, Y.: The forgetron: A kernel-based perceptron on a budget. SIAM J. Comput. 37(5), 1342–1372 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Foster, J., Çetinoglu, Ö., Wagner, J., Roux, J.L., Hogan, S., Nivre, J., Hogan, D., van Genabith, J.: #hardtoparse: Pos tagging and parsing the twitterverse. In: Analyzing Microtext (2011)Google Scholar
  12. 12.
    Gönen, M., Alpaydin, E.: Multiple kernel learning algorithms. Journal of Machine Learning Research 12, 2211–2268 (2011)zbMATHGoogle Scholar
  13. 13.
    Jaakkola, T., Meila, M., Jebara, T.: Maximum entropy discrimination. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) NIPS, pp. 470–476. The MIT Press (1999)Google Scholar
  14. 14.
    Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: Tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169–2188 (2009)CrossRefGoogle Scholar
  15. 15.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers (2002)Google Scholar
  16. 16.
    Kouloumpis, E., Wilson, T., Moore, J.: Twitter sentiment analysis: The good the bad and the omg? In: ICWSM (2011)Google Scholar
  17. 17.
    Kwok, C.C., Etzioni, O., Weld, D.S.: Scaling question answering to the web. In: World Wide Web, pp. 150–161 (2001)Google Scholar
  18. 18.
    Landauer, T., Dumais, S.: A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104 (1997)Google Scholar
  19. 19.
    Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Natural Language Engineering 12(3), 229–249 (2006)CrossRefGoogle Scholar
  20. 20.
    Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In: Machine Learning, pp. 285–318 (1988)Google Scholar
  21. 21.
    Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring. In: ICML, pp. 268–277. Morgan Kaufmann Publishers Inc., San Francisco (1999)Google Scholar
  22. 22.
    Moschitti, A., Pighin, D., Basili, R.: Tree kernels for semantic role labeling. Computational Linguistics 34 (2008)Google Scholar
  23. 23.
    Moschitti, A., Quarteroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question/answer classification. In: Proceedings of ACL 2007 (2007)Google Scholar
  24. 24.
    Orabona, F., Keshet, J., Caputo, B.: The projectron: a bounded kernel-based perceptron. In: Proceedings of ICML 2008, pp. 720–727. ACM, USA (2008)Google Scholar
  25. 25.
    Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREC (2010)Google Scholar
  26. 26.
    Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1-2), 1–135 (2008)CrossRefGoogle Scholar
  27. 27.
    Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6), 386–408 (1958)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Sahlgren, M.: The Word-Space Model. Ph.D. thesis, Stockholm University (2006)Google Scholar
  29. 29.
    Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18 (1975)Google Scholar
  30. 30.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)CrossRefGoogle Scholar
  31. 31.
    Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A.: Experimental perspectives on learning from imbalanced data. In: Proceedings of the ICML. ACM, USA (2007)Google Scholar
  32. 32.
    Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience (1998)Google Scholar
  33. 33.
    Wang, Z., Vucetic, S.: Online passive-aggressive algorithms on a budget. Journal of Machine Learning Research - Proceedings Track 9, 908–915 (2010)Google Scholar
  34. 34.
    Wilson, T., Kozareva, Z., Nakov, P., Ritter, A., Rosenthal, S., Stoyonov, V.: Semeval-2013 task 2: Sentiment analysis in twitter. In: Proceedings of the 7th International Workshop on Semantic Evaluation (2013)Google Scholar
  35. 35.
    Zanzotto, F.M., Pennacchiotti, M., Moschitti, A.: A machine learning approach to textual entailment recognition. Natural Language Engineering 15-04 (2009)Google Scholar
  36. 36.
    Zhang, D., Lee, W.S.: Question classification using support vector machines. In: Proceedings of SIGIR 2003, pp. 26–32. ACM, New York (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Simone Filice
    • 1
  • Giuseppe Castellucci
    • 2
  • Danilo Croce
    • 3
  • Roberto Basili
    • 3
  1. 1.DICIIUniversity of RomaRomaItaly
  2. 2.DIEUniversity of RomaRomaItaly
  3. 3.DIIUniversity of RomaRomaItaly

Personalised recommendations