Multi-label, Multi-class Classification Using Polylingual Embeddings

Balikas, Georgios; Amini, Massih-Reza

doi:10.1007/978-3-319-30671-1_59

Georgios Balikas²¹ &
Massih-Reza Amini²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9626))

Included in the following conference series:

European Conference on Information Retrieval

4338 Accesses
2 Citations
7 Altmetric

Abstract

We propose a Polylingual text Embedding (PE) strategy, that learns a language independent representation of texts using Neural Networks. We study the effects of bilingual representation learning for text classification and we empirically show that the learned representations achieve better classification performance compared to traditional bag-of-words and other monolingual distributed representations. The performance gains are more significant in the interesting case where only few labeled examples are available for training the classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://statmt.org/.
2.
https://code.google.com/p/word2vec/.
3.
The code is available at http://ama.liglab.fr/~balikas/ecir2015.zip.

References

Faruqui, M., Dyer, C.: Improving vector space word representations using multilingual correlation. Association for Computational Linguistics (2014)
Google Scholar
Gao, J., He, X., Yih, W.T., Deng, L.: Learning continuous phrase representations for translation modeling. In: Proceedings of ACL. Association for Computational Linguistics, June 2014
Google Scholar
Gouws, S., Bengio, Y., Corrado, G.: Bilbowa: fast bilingual distributed representations without word alignments. arXiv preprint arXiv:1410.2455 (2014)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Lauly, S., Larochelle, H., Khapra, M., Ravindran, B., Raykar, V.C., Saha, A.: An autoencoder approach to learning bilingual word representations. In: Advances in Neural Information Processing Systems, pp. 1853–1861 (2014)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Partalas, I., Kosmopoulos, A., Baskiotis, N., Artieres, T., Paliouras, G., Gaussier, E., Androutsopoulos, I., Amini, M.R., Galinari, P.: Lshtc: a benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581 (2015)
Řehůřek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, May 2010. http://is.muni.cz/publication/884893/en
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. vol. 1, pp. 1555–1565 (2014)
Google Scholar
Zhang, X., LeCun, Y.: Text understanding from scratch. arXiv preprint arXiv:1502.01710 (2015)

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their valuable comments. This work is partially supported by the CIFRE N 28/2015 and by the LabEx PERSYVAL Lab ANR-11-LABX-0025.

Author information

Authors and Affiliations

University of Grenoble-Alpes, Grenoble, France
Georgios Balikas & Massih-Reza Amini

Authors

Georgios Balikas
View author publications
You can also search for this author in PubMed Google Scholar
Massih-Reza Amini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgios Balikas .

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Padova, Italy
Nicola Ferro
Faculty of Informatics, University of Lugano (USI), Lugano, Switzerland
Fabio Crestani
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Systèmes d’informations, Big Data et Recherche d’Information, Institut de Recherche en Informatique de Toulouse IRIT/équipe SIG, Toulouse Cedex 04, France
Josiane Mothe
Yahoo! Labs London, London, UK
Fabrizio Silvestri
Department of Information Engineering, University of Padua, Padova, Italy
Giorgio Maria Di Nunzio
TU Delft - EWI/ST/WIS, Delft, The Netherlands
Claudia Hauff
Department of Information Engineering, University of Padua, Padova, Italy
Gianmaria Silvello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balikas, G., Amini, MR. (2016). Multi-label, Multi-class Classification Using Polylingual Embeddings. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_59

Download citation

DOI: https://doi.org/10.1007/978-3-319-30671-1_59
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics