Abstract
Recently, Neural Vector Conceptualization (NVC) was proposed as a means to interpret samples from a word vector space. For NVC, a neural model activates higher order concepts it recognizes in a word vector instance. To this end, the model first needs to be trained with a sufficiently large instance-to-concept ground truth, which only exists for a few languages. In this work, we tackle this lack of resources with word vector space alignment techniques: We train the NVC model on a high resource language and test it with vectors from an aligned word vector space of another language, without retraining or fine-tuning. A quantitative and qualitative analysis shows that the NVC model indeed activates meaningful concepts for unseen vectors from the aligned vector space. NVC thus becomes available for low resource languages for which no appropriate concept ground truth exists.
Keywords
- Interpretability
- Explainability
- Word vector space
L. Raithel and R. Schwarzenberg—Shared first authorship.
This is a preview of subscription content, access via your institution.
Buying options


Notes
- 1.
Word vectors retrieved from https://fasttext.cc/docs/en/aligned-vectors.html on 2019/07/16.
- 2.
Retrieved from https://en.wiktionary.org/wiki/Appendix:Mandarin_Frequency_lists on 2019/07/30.
References
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Brunet, M.E., Alkalay-Houlihan, C., Anderson, A., Zemel, R.: Understanding the origins of bias in word embeddings. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 803–811. PMLR, Long Beach, California, USA 09–15 June 2019
Chen, K.J., Huang, C.R., Chang, L.P., Hsu, H.L.: Sinica corpus: design methodology for balanced corpora. In: Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation, pp. 167–176 (1996)
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint arXiv:1710.04087 (2017)
Dev, S., Phillips, J.: Attenuating bias in word vectors. In: Chaudhuri, K., Sugiyama, M. (eds.) Proceedings of Machine Learning Research, vol. 89, pp. 879–887. PMLR, 16–18 April 2019
Ghorbani, A., Wexler, J., Zou, J., Kim, B.: Towards automatic concept-based explanations. Preprint at https://arxiv.org/abs/1902.03129 (2019)
Glavas, G., Litschko, R., Ruder, S., Vulic, I.: How to (properly) evaluate cross-lingual word embeddings: on strong baselines, comparative analyses, and some misconceptions. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 710–721 (2019)
Gromann, D., Declerck, T.: Comparing pretrained multilingual word embeddings on an ontology alignment task. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). European Languages Resources Association (ELRA), Miyazaki, Japan, May 2018
Joulin, A., Bojanowski, P., Mikolov, T., Jégou, H., Grave, E.: Loss in translation: learning bilingual word mapping with a retrieval criterion. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2979–2984 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)
Prost, F., Thain, N., Bolukbasi, T.: Debiasing embeddings for reduced gender bias in text classification. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 69–75 (2019)
Schwarzenberg, R., Raithel, L., Harbecke, D.: Neural vector conceptualization for word vector space interpretation. In: NAACL HLT 2019 (2019)
Wang, Z., Wang, H., Wen, J.R., Xiao, Y.: An inference approach to basic level of categorization. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management - CIKM 2015, pp. 653–662. ACM Press, New York City (2015)
Acknowledgements
This research was partially supported by the German Federal Ministry of Education and Research through the project DEEPLEE (01IW17001) and by Giance Technologies GmbH.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Raithel, L., Schwarzenberg, R. (2019). Cross-lingual Neural Vector Conceptualization. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ccf.org.cn/