Skip to main content

Exploring Implicit Semantic Constraints for Bilingual Word Embeddings


Bilingual word embeddings (BWEs) have proven to be useful in many cross-lingual natural language processing tasks. Previous studies often require bilingual texts or dictionaries that are scarce resources. As a result, in these studies, the exploited explicit semantic information, such as monolingual word co-occurrences and cross-lingual semantic equivalences, is often insufficient for BWE learning, leading to the limitation of learned word representations. To overcome this problem, in this paper, we study how to exploit implicit semantic constraints for better BWEs. Concretely, we first discover implicit monolingual word-level semantic equivalences by pivoting their translations in the other language. Then, we perform BWE learning under various semantic constraints. Experimental results on machine translation and cross-lingual document classification demonstrate the effectiveness of our model.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

  2. 2.

  3. 3.


  1. 1.

    Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  2. 2.

    Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of EMNLP2011, pp 151–161

  3. 3.

    Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of ACL2012, pp 873–882

  4. 4.

    Zou WY, Rocher R, Cer D, Manning CD (2013) Bilingual word embeddings for phrase-based machine translation. In: Proceedings of EMNLP2013, pp 1393–1398

  5. 5.

    Gu X, Gu Y, Wu H (2017) Cascaded convolutional neural networks for aspect-based opinion summary. Neural Process Lett

  6. 6.

    Wu C, Shi X, Su J, Chen Y, Huang Y (2017) Co-training for implicit discourse relation recognition based on manual and distributed features. Neural Process Lett

  7. 7.

    Klementiev A, Titov I, Bhattarai B (2012) Inducing crosslingual distributed representations of words. In: Proceedings of COLING2012, pp 1459–1474

  8. 8.

    Zhou H, Chen L, Shi F, Huang D (2015) Learning bilingual sentiment word embeddings for cross-language sentiment classification. In: Proceedings of ACL2015, pp 430–440

  9. 9.

    Guo J, Che W, Yarowsky D, Wang H, Liu T (2016) A representation learning framework for multi-source transfer parsing. In: Proceedings of AAAI2016, pp 2734–2740

  10. 10.

    Mikolov T, Le QV, Sutskever I (2013) Exploiting similarities among languages for machine translation. Arxiv preprint abs/1309.4168

  11. 11.

    Hermann KM, Blunsom P (2014) Multilingual distributed representations without word alignment. In: Proceedings of ICLR2014

  12. 12.

    Chandar A P S, Lauly S, Larochelle H, Khapra MM, Ravindran B, Raykar V, , Saha A (2014) An autoencoder approach to learning bilingual word representations. In: Proceedings of NIPS2014, pp 1853–1861

  13. 13.

    Soyer H, Stenetorp P, Aizawa A (2015) Leveraging monolingual data for crosslingual compositional word representations. In: Proceedings of ICLR2015

  14. 14.

    Gouws S, Bengio Y, Corrado G (2015) Bilbowa: Fast bilingual distributed representations without word alignments. In: Proceedings of ICML2015, pp 748–756

  15. 15.

    Luong MT, Pham H, Manning CD (2015) Bilingual word representations with monolingual quality in mind. In: Proceedings of NAACL2015, pp 151–159

  16. 16.

    Vulić I, Moens MF (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of SIGIR2015, pp 363–372

  17. 17.

    Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of ICML2008, pp 160–167

  18. 18.

    Mikolov T, Karafi\(\acute{a}\)t M, Burget L, Cernock\(\acute{y}\) J, Khudanpur S (2010) Recurrent neural network based language model. In: Proceedings of INTERSPEECH2010, pp 1045–1048

  19. 19.

    Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of EMNLP2014, pp 1532–1543

  20. 20.

    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Arxiv preprint abs/1301.3781

  21. 21.

    Liu Y, Liu Z, Chua Ts, Sun M (2015) Topical word embeddings. In: Proceedings of AAAI2015, pp 2418–2424

  22. 22.

    Stratos K, Collins M, Hsu D (2015) Model-based word embeddings from decompositions of count matrices. In: Proceedings of ACL2015, pp 1282–1291

  23. 23.

    Liu P, Qiu X, Huang X (2015) Learning context-sensitive word embeddings with neural tensor skip-gram mode. In: Proceedings of IJCAI2015, pp 1284–1290

  24. 24.

    Ling W, Dyer C, Black AW, Trancoso I, Fermandez R, Amir S, Marujo L, Luis T (2015) Finding function in form: compositional character models for open vocabulary word representation. In: Proceedings of EMNLP2015, pp 1520–1530

  25. 25.

    Yin W, Schütze H (2016) Learning word meta-embeddings. In: Proceedings of ACL2016, pp 1351–1360

  26. 26.

    Qian P, Qiu X, Huang X (2016) Investigating language universal and specific properties in word embeddings. In: Proceedings of ACL2016, pp 1478–1488

  27. 27.

    Cotterell R, Schütze H, Eisner J (2016) Morphological smoothing and extrapolation of word embeddings. In: Proceedings of ACL2016, pp 1651–1660

  28. 28.

    Bhatia P, Guthrie R, Eisenstein J (2016) Morphological priors for probabilistic neural word embeddings. In: Proceedings of EMNLP2016, pp 490–500

  29. 29.

    Ji S, Yun H, Yanardag P, Matsushima S, Vishwanathan SVN (2016) Wordrank: Learning word embeddings via robust ranking. In: Proceedings of EMNLP2016, pp 658–668

  30. 30.

    Hermann KM, Blunsom P (2014) Multilingual models for compositional distributed semantics. In: Proceedings of ACL2014, pp 58–68

  31. 31.

    Kočiský T, Hermann KM, Blunsom P (2014) Learning bilingual word representations by marginalizing alignments. In: Proceedings of ACL2014, pp 224–229

  32. 32.

    Vulić I, Moens MF (2015) Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction. In: Proceedings of ACL2015, pp 719–725

  33. 33.

    Lu A, Wang W, Bansal M, Gimpel K, Livescu K (2015) Deep multilingual correlation for improved word embeddings. In: Proceedings of NAACL2015, pp 250–256

  34. 34.

    Shi T, Liu Z, Liu Y, Sun M (2015) Learning cross-lingual word embeddings via matrix co-factorization. In: Proceedings of ACL2015 Short Papers, pp 567–572

  35. 35.

    Coulmance J, Marty JM, Wenzek G, Benhalloum A (2015) Trans-gram, fast cross-lingual word-embeddings. In: Proceedings of EMNLP2015, pp 1109–1113

  36. 36.

    Oshikiri T, Fukui K, Shimodaira H (2016) Cross-lingual word representations via spectral graph embeddings. In: Proceedings of ACL2016 short paper, pp 493–498

  37. 37.

    Duong L, Kanayama H, Ma T, Bird S, Cohn T (2016) Learning crosslingual word embeddings without bilingual corpora. In: Proceedings of EMNLP2016, pp 1285–1295

  38. 38.

    Upadhyay S, Faruqui M, Dyer C, Roth D (2016) Cross-lingual models of word embeddings: An empirical comparison. In: Proceedings of ACL2016, pp 1661–1670

  39. 39.

    Guo J, Che W, Wang H, Liu T (2014) Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING2014, pp 497–507

  40. 40.

    Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of WWW2015, pp 1067–1077

  41. 41.

    Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Proceedings of EACL2014, pp 462–471

  42. 42.

    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS2013, pp 3111–3119

  43. 43.

    Wu H, Wang H (2007) Pivot language approach for phrase-based statistical machine translation. In: Proceedings of ACL2007, pp 165–181

    Article  Google Scholar 

  44. 44.

    Zhang J, Liu S, Li M, Zhou M, Zong C (2014) Bilingually-constrained phrase embeddings for machine translation. In: Proceedings of ACL2014, pp 111–121

  45. 45.

    Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29:19–51

    Article  Google Scholar 

  46. 46.

    Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of ACL2007 demo and poster sessions, pp 177–180

  47. 47.

    Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of ACL2002, pp 295–302

  48. 48.

    Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: A method for automatic evaluation of machine translation. In: Proceedings of ACL2002, pp 311–318

  49. 49.

    Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of EMNLP2004, pp 388–395

  50. 50.

    Tian L, Wong DF, Chao LS, Quaresma P, Oliveira F, Yi L (2014) A large english-chinese parallel corpus for statistical machine translation. In: Proceedings of LREC2014, pp 1837–1842

  51. 51.

    Maaten LVD, Hinton G (2008) Visualizing high-dimensional data using t-sne. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  52. 52.

    Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  53. 53.

    Tenenbaum JB, Silva Vd, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  Google Scholar 

Download references


We would like to thank all the reviewers for their constructive and helpful suggestions on this paper.

Author information



Corresponding author

Correspondence to Yidong Chen.

Additional information

The authors were supported by National Natural Science Foundation of China (Nos. 61672440 and 61573294), Scientific Research Project of National Language Committee of China (Grant No. YB135-49), Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201742).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Su, J., Song, Z., Lu, Y. et al. Exploring Implicit Semantic Constraints for Bilingual Word Embeddings. Neural Process Lett 48, 1073–1088 (2018).

Download citation


  • Bilingual word embeddings
  • Word alignment
  • Machine translation
  • Document classification