Knowledge Transfer for Utterance Classification in Low-Resource Languages

  • Andrei Smirnov
  • Valentin Mendelev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)


The paper deals with a problem of short text classification in Kazakh. Traditional text classification approaches require labeled data to build accurate classifiers. However the amount of available labeled data is usually very limited due to high cost of labeling or data accessibility issues. We describe a method of constructing a classifier without labeled data in the target language. A convolutional neural network (CNN) is trained on Russian labeled texts and a language vector space transform is used to transfer knowledge from Russian into Kazakh. Classification accuracy is evaluated on a dataset of customer support requests. The presented method demonstrates competitive results compared with an approach that employed a sophisticated automatic translation system.


Text classification Language vector space Word embeddings CNN Low-resource 



This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.579.21.0008, ID RFMEFI57914X0008.


  1. 1.
    Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation.
  2. 2.
  3. 3.
    Bengio, Y., Corrado, G.: Bilbowa: Fast bilingual distributed representations without word alignments (2014)Google Scholar
  4. 4.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  5. 5.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  6. 6.
    Coulmance, J., Marty, J.M., Wenzek, G., Benhalloum, A.: Trans-gram, fast cross-lingual word-embeddings. arXiv preprint arXiv:1601.02502 (2016)
  7. 7.
    Erk, K., Padó, S.: A structured vector space model for word meaning in context. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 897–906. Association for Computational Linguistics (2008)Google Scholar
  8. 8.
    Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882. Association for Computational Linguistics (2012)Google Scholar
  9. 9.
    Irsoy, O., Cardie, C.: Deep recursive neural networks for compositionality in language. In: Advances in Neural Information Processing Systems, pp. 2096–2104 (2014)Google Scholar
  10. 10.
    Iyyer, M., Enns, P., Boyd-Graber, J., Resnik, P.: Political ideology detection using recursive neural networks. In: Proceedings of the Association for Computational Linguistics (2014)Google Scholar
  11. 11.
    Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
  12. 12.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  13. 13.
    Klementiev, A., Titov, I., Bhattarai, B.: Inducing crosslingual distributed representations of words (2012)Google Scholar
  14. 14.
    Le, P., Zuidema, W.: Compositional distributional semantics with long short term memory. arXiv preprint arXiv:1503.02510 (2015)
  15. 15.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, vol. 14, pp. 1188–1196 (2014)Google Scholar
  16. 16.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  17. 17.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  18. 18.
    Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)Google Scholar
  19. 19.
    Socher, R., Pennington, J., Huang, E.H., Ng, A.Y., Manning, C.D.: Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 151–161. Association for Computational Linguistics (2011)Google Scholar
  20. 20.
    Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)Google Scholar
  21. 21.
    Turney, P.D., Pantel, P., et al.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.STC-InnovationsSaint PetersburgRussia
  2. 2.Speech Technology CenterSaint PetersburgRussia
  3. 3.ITMO UniversitySaint PetersburgRussia

Personalised recommendations