Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia

  • Roger Granada
  • Cassia Trojahn
  • Renata Vieira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8775)


The growth of available data in digital format has been facilitating the development of new models to automatically infer the semantic similarity between word pairs. However, there are still many natural languages without sufficient resources to evaluate measures of semantic relatedness. In this paper we translated word pairs from a well-known baseline for evaluating semantic relatedness measures into Portuguese and performed a manual evaluation of each pair. We compared the correlation with similar datasets in other languages and generated LSA models from Wikipedia articles in order to verify the pertinence of each dataset and how semantic similarity conveys across languages.


Semantic relatedness semantic similarity similarity dataset 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, L., Chen, S.: A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval. International Journal of Information and Management Sciences 18(4), 299–315 (2007)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction. Computational Linguistics 39(3), 709–754 (2013)CrossRefGoogle Scholar
  3. 3.
    Erk, K.: Vector Space Models of Word Meaning and Phrase Meaning: A Survey. Language and Linguistics Compass 6(10), 635–653 (2012)CrossRefGoogle Scholar
  4. 4.
    Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)CrossRefGoogle Scholar
  5. 5.
    Granada, R.L., Vieira, R., Strube de Lima, V.L.: Evaluating co-occurrence order for automatic thesaurus construction. In: IEEE 13th International Conference on Information Reuse and Integration (IRI), pp. 474–481 (2012)Google Scholar
  6. 6.
    Harris, Z.S.: Distributional structure. Words 10(23), 146–162 (1954)Google Scholar
  7. 7.
    Hassan, S., Mihalcea, R.: Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. In: EMNLP 2009, pp. 1192–1201. Association for Computational Linguistics, Stroudsburg (2009)Google Scholar
  8. 8.
    Iosif, E., Potamianos, A.: Similarity computation using semantic networks created from web-harvested data. Natural Language Engineering, 1–31 (2014)Google Scholar
  9. 9.
    Joubarne, C., Inkpen, D.: Comparison of Semantic Similarity for Different Languages Using the Google N-gram Corpus and Second-Order Co-occurrence Measures. In: Butz, C., Lingras, P. (eds.) Canadian AI 2011. LNCS (LNAI), vol. 6657, pp. 216–221. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)CrossRefGoogle Scholar
  11. 11.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language & Cognitive Processes 6(1), 1–28 (1991)CrossRefGoogle Scholar
  12. 12.
    Rehurek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta (2010)Google Scholar
  13. 13.
    Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  14. 14.
    Utsumi, A.: A semantic space approach to the computational semantics of noun compounds. Natural Language Engineering 20(2), 185–234 (2014)CrossRefGoogle Scholar
  15. 15.
    Yang, D., Powers, D.M.W.: Automatic thesaurus construction. In: 31st Australasian conference on Computer science – ACSC 2008, pp. 147–156. Australian Computer Society, Inc., Darlinghurst (2008)Google Scholar
  16. 16.
    Zhu, Z., Li, M., Chen, L., Yang, Z.: Building Comparable Corpora Based on Bilingual LDA Model. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 278–282. Association for Computational Linguistics, Sofia (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Roger Granada
    • 1
  • Cassia Trojahn
    • 2
  • Renata Vieira
    • 3
  1. 1.PUCRS & IRIT - ToulouseFrance
  2. 2.UTM & IRIT - ToulouseFrance
  3. 3.PUCRS - Porto AlegreBrazil

Personalised recommendations