Improving Word Representations Using Paraphrase Dataset

  • Flávio Arthur O. Santos
  • Hendrik T. Macedo
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 738)


Recently, the NLP community has focused on finding methods for learning good vectorial word representations. These vectorial representations must be good enough to capture semantic relationships between words using simple vector arithmetic operations. Currently, two methods stand out: GloVe and word2vec. We argue that the proper usage of knowledge bases such as WordNet, Freebase and Paraphrase can improve even further the results of such methods. Although the attempt to incorporate information from knowledge bases in vectorial word representations is not new, results are not compared to that of GloVe nor word2vec. In this paper, we propose a method to incorporate the knowledge of Paraphrase knowledge base into GloVe. Results show that such incorporation improves GloVe’s original results for at least three different benchmarks.


GloVe Paraphrase Knowledge base Word embeddings Natural language processing 



The authors thank CAPES and FAPITEC-SE for the financial support [Edital CAPES/FAPITEC/SE No 11/2016 - PROEF, Processo 88887.160994/2017-00] and LCAD-UFS for providing a cluster for the execution of the experiments. The authors also thank FAPITEC-SE for granting a graduate scholarship to Flávio Santos, CNPq for granting an productivity scholarship to Hendrik Macedo [DT-II, Processo 310446/2014-7].


  1. 1.
    E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Paşca, A. Soroa, A study on similarity and relatedness using distributional and wordnet-based approaches, in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics (2009), pp. 19–27Google Scholar
  2. 2.
    S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, DBpedia: a nucleus for a web of open data, in The semantic web (Springer, Berlin, 2007), pp. 722–735Google Scholar
  3. 3.
    K. Bollacker, C. Evans, P. Paritosh, T. Sturge, J. Taylor, Freebase: a collaboratively created graph database for structuring human knowledge, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2008), pp. 1247–1250Google Scholar
  4. 4.
    A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, O. Yakhnenko, Translating embeddings for modeling multi-relational data, in Advances in Neural Information Processing Systems (2013), pp. 2787–2795Google Scholar
  5. 5.
    E. Bruni, N.-K. Tran, M. Baroni, Multimodal distributional semantics. J. Artif. Intell. Res. 49(2014), 1–47 (2014)Google Scholar
  6. 6.
    A. Budanitsky, G. Hirst, Semantic distance in wordnet: an experimental, application-oriented evaluation of five measures, in Workshop on WordNet and Other Lexical Resources, vol. 2 (2001), p. 2Google Scholar
  7. 7.
    A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E.R. Hruschka Jr., T.M. Mitchell, Toward an architecture for never-ending language learning, in AAAI, vol. 5 (2010), p. 3Google Scholar
  8. 8.
    C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, T. Robinson, One billion word benchmark for measuring progress in statistical language modeling (2013, preprint). arXiv:1312.3005Google Scholar
  9. 9.
    R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)Google Scholar
  10. 10.
    J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)Google Scholar
  11. 11.
    D. Fried, K. Duh, Incorporating both distributional and relational semantics in word representations (2014, preprint). arXiv:1412.4369Google Scholar
  12. 12.
    J. Ganitkevitch, B. Van Durme, C. Callison-Burch, PPDB: the paraphrase database, in Proceedings of NAACL-HLT, Atlanta, GA, Association for Computational Linguistics (2013), pp. 758–764Google Scholar
  13. 13.
    M. Gutmann, A. Hyvärinen, Noise-contrastive estimation: a new estimation principle for unnormalized statistical models, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (2010), pp. 297–304Google Scholar
  14. 14.
    F. Hill, R. Reichart, A. Korhonen, Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41, 665–695 (2016)Google Scholar
  15. 15.
    C. AEM Júnior, L.A. Barbosa, H.T. Macedo, S.E. Súo Cristóvão, Uma arquitetura híbrida lstm-cnn para reconhecimento de entidades nomeadas em textos naturais em língua portuguesa (2016)Google Scholar
  16. 16.
    H. Lakkaraju, R. Socher, C. Manning, Aspect specific sentiment analysis using hierarchical deep learning, in NIPS Workshop on Deep Learning and Representation Learning (2014)Google Scholar
  17. 17.
    T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (2013), pp. 3111–3119Google Scholar
  18. 18.
    G.A. Miller, Wordnet: a lexical database for english. Commun. ACM 38(11): 39–41 (1995)CrossRefGoogle Scholar
  19. 19.
    R. Paulus, C. Xiong, R. Socher, A deep reinforced model for abstractive summarization (2017, preprint). arXiv:1705.04304Google Scholar
  20. 20.
    J. Pennington, R. Socher, C.D. Manning, Glove: global vectors for word representation, in EMNLP, vol. 14 (2014), pp. 1532–1543Google Scholar
  21. 21.
    R. Socher, D. Chen, C.D. Manning, A. Ng, Reasoning with neural tensor networks for knowledge base completion, in Advances in Neural Information Processing Systems (2013), pp. 926–934,Google Scholar
  22. 22.
    Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al., Google’s neural machine translation system: bridging the gap between human and machine translation (2016, preprint). arXiv:1609.08144Google Scholar
  23. 23.
    C. Xiong, V. Zhong, R. Socher, Dynamic coattention networks for question answering (2016, preprint). arXiv:1611.01604Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Flávio Arthur O. Santos
    • 1
  • Hendrik T. Macedo
    • 1
  1. 1.Computer Science Postgraduate ProgramFederal University of SergipeSão CristóvãoBrazil

Personalised recommendations