Advertisement

Transductive Learning with String Kernels for Cross-Domain Text Classification

  • Radu Tudor Ionescu
  • Andrei Madalin Butnaru
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11303)

Abstract

For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.

Keywords

Transductive learning Domain adaptation Cross-domain classification String kernels Sentiment analysis Polarity classification 

References

  1. 1.
    Bhatt, S.H., Semwal, D., Roy, S.: An iterative similarity based adaptation technique for cross-domain text classification. In: Proceedings of CONLL, pp. 52–61 (2015)Google Scholar
  2. 2.
    Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boomboxes and blenders: domain adaptation for sentiment classification. In: Proceedings of ACL, pp. 187–205 (2007)Google Scholar
  3. 3.
    Bollegala, D., Weir, D., Carroll, J.: Cross-domain sentiment classification using a sentiment sensitive thesaurus. IEEE Trans. Knowl. Data Eng. 25(8), 1719–1731 (2013)CrossRefGoogle Scholar
  4. 4.
    Butnaru, A.M., Ionescu, R.T.: UnibucKernel reloaded: first place in Arabic dialect identification for the second year in a row. In: Proceedings of VarDial Workshop of COLING, pp. 77–87 (2018)Google Scholar
  5. 5.
    Ceci, M.: Hierarchical text categorization in a transductive setting. In: Proceedings of ICDMW, pp. 184–191, December 2008Google Scholar
  6. 6.
    Chang, W.C., Wu, Y., Liu, H., Yang, Y.: Cross-domain kernel induction for transfer learning. In: Proceedings of AAAI, pp. 1763–1769, February 2017Google Scholar
  7. 7.
    Cozma, M., Butnaru, A., Ionescu, R.T.: Automated essay scoring with string kernels and word embeddings. In: Proceedings of ACL, pp. 503–509 (2018)Google Scholar
  8. 8.
    Daumé III, H.: Frustratingly easy domain adaptation. In: Proceedings of ACL, pp. 256–263 (2007)Google Scholar
  9. 9.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)CrossRefGoogle Scholar
  10. 10.
    Escalante, H.J., Solorio, T., Montes-y-Gómez, M.: Local histograms of character n-grams for authorship attribution. In: Proceedings of ACL: HLT, vol. 1, pp. 288–298 (2011)Google Scholar
  11. 11.
    Fernández, A.M., Esuli, A., Sebastiani, F.: Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification. J. Artif. Intell. Res. 55(1), 131–163 (2016)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Franco-Salvador, M., Cruz, F.L., Troyano, J.A., Rosso, P.: Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowl. Based Syst. 86, 46–56 (2015)CrossRefGoogle Scholar
  13. 13.
    Giménez-Pérez, R.M., Franco-Salvador, M., Rosso, P.: Single and cross-domain polarity classification using string kernels. In: Proceedings of EACL, pp. 558–563, April 2017Google Scholar
  14. 14.
    Guo, Y., Xiao, M.: Transductive representation learning for cross-lingual text classification. In: Proceedings of ICDM, pp. 888–893, December 2012Google Scholar
  15. 15.
    Huang, X., Rao, Y., Xie, H., Wong, T.L., Wang, F.L.: Cross-domain sentiment classification via topic-related TrAdaBoost. In: Proceedings of AAAI, pp. 4939–4940 (2017)Google Scholar
  16. 16.
    Ifrim, G., Weikum, G.: Transductive learning for text classification using explicit knowledge models. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 223–234. Springer, Heidelberg (2006).  https://doi.org/10.1007/11871637_24CrossRefGoogle Scholar
  17. 17.
    Ionescu, R.T.: A fast algorithm for local rank distance: application to arabic native language identification. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9490, pp. 390–400. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-26535-3_45CrossRefGoogle Scholar
  18. 18.
    Ionescu, R.T., Butnaru, A.: Learning to identify arabic and german dialects using multiple kernels. In: Proceedings of VarDial Workshop of EACL, pp. 200–209 (2017)Google Scholar
  19. 19.
    Ionescu, R.T., Butnaru, A.M.: Improving the results of string kernels in sentiment analysis and Arabic dialect identification by adapting them to your test set. In: Proceedings of EMNLP (2018)Google Scholar
  20. 20.
    Ionescu, R.T., Popescu, M.: Native language identification with string kernels. In: Ionescu, R.T., Popescu, M. (eds.) Knowledge Transfer between Computer Vision and Text Mining. ACVPR, pp. 193–227. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-30367-3_8CrossRefGoogle Scholar
  21. 21.
    Ionescu, R.T., Popescu, M.: UnibucKernel: an approach for Arabic dialect identification based on multiple string kernels. In: Proceedings of VarDial Workshop of COLING, pp. 135–144 (2016)Google Scholar
  22. 22.
    Ionescu, R.T., Popescu, M.: Can string kernels pass the test of time in native language identification? In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 224–234 (2017)Google Scholar
  23. 23.
    Ionescu, R.T., Popescu, M., Cahill, A.: Can characters reveal your native language? A language-independent approach to native language identification. In: Proceedings of EMNLP, pp. 1363–1373, October 2014Google Scholar
  24. 24.
    Ionescu, R.T., Popescu, M., Cahill, A.: String kernels for native language identification: insights from behind the curtains. Comput. Linguist. 42(3), 491–525 (2016)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of ICML, pp. 200–209 (1999)Google Scholar
  26. 26.
    Li, T., Sindhwani, V., Ding, C., Zhang, Y.: Knowledge transformation for cross-domain sentiment classification. In: Proceedings of SIGIR, pp. 716–717 (2009)Google Scholar
  27. 27.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)zbMATHGoogle Scholar
  28. 28.
    Long, M., Wang, J., Ding, G., Pan, S.J., Yu, P.S.: Adaptation regularization: a general framework for transfer learning. IEEE Trans. Knowl. Data Eng. 26(5), 1076–1089 (2014)CrossRefGoogle Scholar
  29. 29.
    Lui, M., Baldwin, T.: Cross-domain feature selection for language identification. In: Proceedings of IJCNLP, pp. 553–561 (2011)Google Scholar
  30. 30.
    Luo, K.H., Deng, Z.H., Yu, H., Wei, L.C.: JEAM: a novel model for cross-domain sentiment classification based on emotion analysis. In: Proceedings of EMNLP, pp. 2503–2508 (2015)Google Scholar
  31. 31.
    Nelakurthi, A.R., Tong, H., Maciejewski, R., Bliss, N., He, J.: User-guided cross-domain sentiment classification. In: Proceedings of SDM (2017)Google Scholar
  32. 32.
    Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW, pp. 751–760 (2010)Google Scholar
  33. 33.
    Ponomareva, N., Thelwall, M.: Semi-supervised vs. cross-domain graphs for sentiment analysis. In: Proceedings of RANLP, pp. 571–578, September 2013Google Scholar
  34. 34.
    Popescu, M., Grozea, C.: Kernel methods and string kernels for authorship analysis. In: Proceedings of CLEF (Online Working Notes/Labs/Workshop), September 2012Google Scholar
  35. 35.
    Popescu, M., Grozea, C., Ionescu, R.T.: HASKER: an efficient algorithm for string kernels. Application to polarity classification in various languages. In: Proceedings of KES, pp. 1755–1763 (2017)CrossRefGoogle Scholar
  36. 36.
    Popescu, M., Ionescu, R.T.: The story of the characters, the DNA and the native language. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 270–278, June 2013Google Scholar
  37. 37.
    Sener, O., Song, H.O., Saxena, A., Savarese, S.: Learning transferrable representations for unsupervised domain adaptation. In: Proceedings of NIPS, pp. 2110–2118 (2016)Google Scholar
  38. 38.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  39. 39.
    Shu, L., Latecki, L.J.: Transductive domain adaptation with affinity learning. In: Proceedings of CIKM, pp. 1903–1906. ACM (2015)Google Scholar
  40. 40.
    Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: Proceedings of AAAI, pp. 2058–2065 (2016)Google Scholar
  41. 41.
    Zampieri, M., et al.: Findings of the VarDial evaluation campaign 2017. In: Proceedings of VarDial Workshop of EACL, pp. 1–15 (2017)Google Scholar
  42. 42.
    Zhuang, F., Luo, P., Yin, P., He, Q., Shi, Z.: Concept learning for cross-domain text classification: a general probabilistic framework. In: Proceedings of IJCAI, pp. 1960–1966 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of BucharestBucharestRomania

Personalised recommendations