Advertisement

Low-Resource Text Classification Using Domain-Adversarial Learning

  • Daniel GrießhaberEmail author
  • Ngoc Thang VuEmail author
  • Johannes MaucherEmail author
Conference paper
  • 289 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11171)

Abstract

Deep learning techniques have recently shown to be successful in many natural language processing tasks forming state-of-the-art systems. They require, however, a large amount of annotated data which is often missing. This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural network in low-resource and zero-resource settings in new target domains or languages. In case of new languages, we show that monolingual word-vectors can be directly used for training without pre-alignment. Their projection into a common space can be learnt ad-hoc at training time reaching the final performance of pretrained multilingual word-vectors.

Keywords

NLP Low-resource Deep learning Domain-adversarial 

Notes

Acknowledgments

This research and development project is funded within the “Future of Work” Program by the German Federal Ministry of Education and Research (BMBF) and the European Social Fund in Germany. It is implemented by the Project Management Agency Karlsruhe (PTKA). The authors are responsible for the content of this publication.

References

  1. 1.
    Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 163–222. Springer, Boston (2012).  https://doi.org/10.1007/978-1-4614-3223-4_6CrossRefGoogle Scholar
  2. 2.
    Agić, Ž., Johannsen, A., Plank, B., Martínez, H.A., Schluter, N., Søgaard, A.: Multilingual projection for parsing truly low-resource languages. Trans. Assoc. Comput. Linguist. 4, 301 (2016)Google Scholar
  3. 3.
    Ammar, W., Mulcaire, G., Tsvetkov, Y., Lample, G., Dyer, C., Smith, N.A.: Massively multilingual word embeddings. arXiv preprint arXiv:1602.01925 (2016)
  4. 4.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv.org (2017)Google Scholar
  5. 5.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural Machine Translation by Jointly Learning to Align and Translate. arXiv.org (2014)Google Scholar
  6. 6.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)Google Scholar
  7. 7.
    Chen, X., Athiwaratkun, B., Sun, Y., Weinberger, K.Q., Cardie, C.: Adversarial deep averaging networks for cross-lingual sentiment classification. In: NIPS cs.CL (2017)Google Scholar
  8. 8.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural Language Processing (almost) from Scratch. arXiv.org (2011)Google Scholar
  9. 9.
    Fang, M., Cohn, T.: Model transfer for tagging low-resource languages using a bilingual dictionary. arXiv preprint arXiv:1705.00424 (2017)
  10. 10.
    Faruqui, M., Dyer, C.: Improving vector space word representations using multilingual correlation. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 462–471 (2014)Google Scholar
  11. 11.
    Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. (JMLR) 17(1), 1–35 (2016)Google Scholar
  12. 12.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)Google Scholar
  13. 13.
    Goodfellow, I.J., et al.: Generative adversarial networks. In: Advances in Neural Information Processing Systems (NIPS) (2014)Google Scholar
  14. 14.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. arXiv (2017)Google Scholar
  15. 15.
    Hao, S., Boyd-Graber, J.L., Paul, M.J.: Lessons from the Bible on Modern Topics - Low-Resource Multilingual Topic Model Evaluation. CoRR cs.CL (2018)Google Scholar
  16. 16.
    Harris, Z.S.: Distributional structure. In: Papers in Structural and Transformational Linguistics, pp. 775–794. Springer, Netherlands (1970).  https://doi.org/10.1007/978-94-017-6059-1CrossRefGoogle Scholar
  17. 17.
    He, R., McAuley, J.: Ups and downs - modeling the visual evolution of fashion trends with one-class collaborative filtering. In: NIPS (2016)Google Scholar
  18. 18.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv.org (2012)Google Scholar
  19. 19.
    Hjelm, R.D., Jacob, A.P., Che, T., Cho, K., Bengio, Y.: Boundary-seeking generative adversarial networks. arXiv (2017)Google Scholar
  20. 20.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
  21. 21.
    Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014 (2014)Google Scholar
  22. 22.
    Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. arXiv.org (2014)Google Scholar
  23. 23.
    Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., Jurafsky, D.: Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2157–2169. Association for Computational Linguistics, Copenhagen, September 2017. https://www.aclweb.org/anthology/D17-1230
  24. 24.
    Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012).  https://doi.org/10.1007/978-1-4614-3223-4_13CrossRefGoogle Scholar
  25. 25.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)Google Scholar
  26. 26.
    McAuley, J.J., Targett, C., Shi, Q., van den Hengel, A.: Image-based recommendations on styles and substitutes. In: NIPS (2015)Google Scholar
  27. 27.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: NIPS (2013)Google Scholar
  28. 28.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 807–814 (2010)Google Scholar
  29. 29.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)Google Scholar
  30. 30.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  31. 31.
    Press, O., Bar, A., Bogin, B., Berant, J., Wolf, L.: Language generation with recurrent generative adversarial networks without pre-training. arXiv (2017)Google Scholar
  32. 32.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)Google Scholar
  33. 33.
    Rajeswar, S., Subramanian, S., Dutil, F., Pal, C., Courville, A.: Adversarial generation of natural language. arXiv (2017)Google Scholar
  34. 34.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk E-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop. AAAI Technical Report WS-98-05, Madison, Wisconsin (1998). citeseer.ist.psu.edu/sahami98bayesian.html
  35. 35.
    Salameh, M., Mohammad, S., Kiritchenko, S.: Sentiment after translation: a case-study on Arabic social media posts. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 767–777 (2015)Google Scholar
  36. 36.
    Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: NIPS (2015)Google Scholar
  37. 37.
    Villani, C.: Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-71050-9CrossRefzbMATHGoogle Scholar
  38. 38.
    Wang, S., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2 (2012)Google Scholar
  39. 39.
    Xu, K., et al.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv.org (2015)
  40. 40.
    Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: HLT-NAACL (2016)Google Scholar
  41. 41.
    Zhao, J.J., Kim, Y., Zhang, K., Rush, A.M., LeCun, Y.: Adversarially regularized autoencoders for generating discrete structures. arXiv (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Stuttgart Media UniversityStuttgartGermany
  2. 2.Institute for Natural Language Processing (IMS)University of StuttgartStuttgartGermany

Personalised recommendations