Advertisement

Language Resources and Evaluation

, Volume 52, Issue 4, pp 921–948 | Cite as

COVER: a linguistic resource combining common sense and lexicographic information

  • Enrico Mensa
  • Daniele P. RadicioniEmail author
  • Antonio Lieto
Original Paper

Abstract

Lexical resources are fundamental to tackle many tasks that are central to present and prospective research in Text Mining, Information Retrieval, and connected to Natural Language Processing. In this article we introduce COVER, a novel lexical resource, along with COVERAGE, the algorithm devised to build it. In order to describe concepts, COVER proposes a compact vectorial representation that combines the lexicographic precision characterizing BabelNet and the rich common-sense knowledge featuring ConceptNet. We propose COVER as a reliable and mature resource, that has been employed in as diverse tasks as conceptual categorization, keywords extraction, and conceptual similarity. The experimental assessment is performed on the last task: we report and discuss the obtained results, pointing out future improvements. We conclude that COVER can be directly exploited to build applications, and coupled with existing resources, as well.

Keywords

Lexical resources Lexical semantics Common sense knowledge Vector representation Concept similarity NLP 

References

  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of NAACL, NAACL ’09 (pp. 19–27). Association for Computational Linguistics.Google Scholar
  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In The semantic web (pp. 722–735).CrossRefGoogle Scholar
  3. Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley framenet project. In Proceedings of the 17th international conference on computational linguistics (Vol. 1, pp. 86–90). Association for Computational Linguistics.Google Scholar
  4. Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (Vol. 1, pp. 238–247).Google Scholar
  5. Bosco, C., Patti, V., & Bolioli, A. (2013). Developing corpora for sentiment analysis: The case of irony and Senti-TUT. IEEE Intelligent Systems, 28(2), 55–63.CrossRefGoogle Scholar
  6. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguists, 32(1), 13–47.CrossRefGoogle Scholar
  7. Camacho-Collados, J., Pilehvar, M. T., Collier, N., & Navigli, R. (2017). Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity. In Proceedings of the 11th international workshop on semantic evaluation (SemEval 2017), Vancouver, Canada.Google Scholar
  8. Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015). A unified multilingual semantic representation of concepts. In Proceedings of ACL, Beijing, China.Google Scholar
  9. Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015). NASARI: A novel approach to a semantically-aware representation of items. In Proceedings of NAACL (pp. 567–577).Google Scholar
  10. Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2016). NASARI: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence, 240, 36–64.CrossRefGoogle Scholar
  11. Cambria, E., Schuller, B., Liu, B., Wang, H., & Havasi, C. (2013). Knowledge-based approaches to concept-level sentiment analysis. IEEE Intelligent Systems, 28(2), 12–14.CrossRefGoogle Scholar
  12. Cambria, E., Speer, R., Havasi, C., & Hussain, A. (2010). Senticnet: A publicly available semantic resource for opinion mining. In AAAI fall symposium: Commonsense knowledge (Vol. 10).Google Scholar
  13. Ciaramita, M., & Johnson, M. (2003). Supersense tagging of unknown nouns in wordnet. In Proceedings of the 2003 conference on empirical methods in natural language processing (pp. 168–175). Association for Computational Linguistics.Google Scholar
  14. Colla, D., Mensa, E., & Radicioni, D. P. (2017). Semantic measures for keywords extraction. In AI*IA 2017: Advances in artificial intelligence. Lecture notes for artificial intelligence. Springer.Google Scholar
  15. Colla, D., Mensa, E., Radicioni, D. P., & Lieto, A. (2018). Tell me why: Computational explanation of conceptual similarity judgments. In Proceedings of the 17th international conference on information processing and management of uncertainty in knowledge-based systems (IPMU), special session on advances on explainable artificial intelligence, communications in computer and information science (CCIS). Springer, Cham.Google Scholar
  16. Denecke, K. (2008). Using sentiwordnet for multilingual sentiment analysis. In IEEE 24th international conference on data engineering workshop, 2008. ICDEW 2008 (pp. 507–512). IEEE.Google Scholar
  17. Derrac, J., & Schockaert, S. (2015). Inducing semantic relations from conceptual spaces: A data-driven approach to plausible reasoning. Artificial Intelligence, 228, 66–94.CrossRefGoogle Scholar
  18. Devitt, A., & Ahmad, K. (2013). Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Language Resources and Evaluation, 47(2), 475–511.CrossRefGoogle Scholar
  19. Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166.
  20. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2001). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on world wide web (pp. 406–414). ACM.Google Scholar
  21. Francopoulo, G., Bel, N., George, M., Calzolari, N., Monachini, M., Pet, M., et al. (2009). Multilingual resources for NLP in the lexical markup framework (LMF). Language Resources and Evaluation, 43(1), 57–70.CrossRefGoogle Scholar
  22. Ganitkevitch, J., Van Durme, B., & Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of NAACL-HLT (pp. 758–764).Google Scholar
  23. Gärdenfors, P. (2014). The geometry of meaning: Semantics based on conceptual spaces. Cambridge: MIT Press.Google Scholar
  24. Gînscă, A.-L., Boroş, E., Iftene, A., Trandabăţ, D., Toader, M., Corîci, M., Perez, C.-A., & Cristea, D. (2011). Sentimatrix: Multilingual sentiment analysis service. In Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (pp. 189–195). Association for Computational Linguistics.Google Scholar
  25. Harabagiu, S., & Moldovan, D. (2003). Question answering. In The Oxford handbook of computational linguistics. Oxford University Press.Google Scholar
  26. Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.CrossRefGoogle Scholar
  27. Havasi, C., Speer, R., & Alonso, J. (2007). ConceptNet: A lexical resource for common sense knowledge. In Recent advances in natural language processing V: Selected papers from RANLP (Vol. 309, p. 269).Google Scholar
  28. Hovy, E. (2003). Text summarization. In The Oxford handbook of computational linguistics (2nd edn.). Oxford University Press.Google Scholar
  29. Jean-Louis, L., Zouaq, A., Gagnon, M., & Ensan, F. (2014). An assessment of online semantic annotators for the keyword extraction task. In Pacific Rim international conference on artificial intelligence (pp. 548–560). Springer.Google Scholar
  30. Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.Google Scholar
  31. Jimenez, S., Becerra, C., Gelbukh, A, Bátiz, A. J. D., & Mendizábal, A. (2013). Softcardinality-core: Improving text overlap with distributional measures for semantic textual similarity. In Proceedings of *SEM 2013 (Vol. 1, pp. 194–201).Google Scholar
  32. Langley, P. (2012). The cognitive systems paradigm. Advances in Cognitive Systems, 1, 3–13.Google Scholar
  33. Leacock, C., Miller, G. A., & Chodorow, M. (1998). Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, 24(1), 147–165.Google Scholar
  34. Lenat, D. B., Prakash, M., & Shepherd, M. (1985). CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. AI Magazine, 6(4), 65.Google Scholar
  35. Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press.Google Scholar
  36. Lieto, A., Minieri, A., Piana, A., Radicioni, D. P., & Frixione, M. (2014). A dual process architecture for ontology-based systems. In 6th international conference on knowledge engineering and ontology development, KEOD 2014 (pp. 48–55). INSTICC Press.Google Scholar
  37. Lieto, A., Lebiere, C., & Oltramari, A. (2018). The knowledge level in cognitive architectures: Current limitations and possible developments. Cognitive Systems Research, 48, 39–55.CrossRefGoogle Scholar
  38. Lieto, A., Mensa, E., & Radicioni, D. P. (2016). A resource-driven approach for anchoring linguistic resources to conceptual spaces. In Proceedings of the XVth international conference of the italian association for artificial intelligence, Genova, Italy, November 29–December 1, 2016, volume 10037 of lecture notes in artificial intelligence (pp. 435–449). Springer.Google Scholar
  39. Lieto, A., Mensa, E., & Radicioni, D. P. (2016). Taming sense sparsity: A common-sense approach. In Proceedings of third Italian conference on computational linguistics (CLiC-it 2016) and fifth evaluation campaign of natural language processing and speech tools for Italian.Google Scholar
  40. Lieto, A., Minieri, A., Piana, A., & Radicioni, D. P. (2015). A knowledge-based system for prototypical reasoning. Connection Science, 27(2), 137–152.CrossRefGoogle Scholar
  41. Lieto, A., & Radicioni, D. P. (2016). From human to artificial cognition and back: New perspectives on cognitively inspired ai systems. Cognitive Systems Research, 39, 1–3.CrossRefGoogle Scholar
  42. Lieto, A., Radicioni, D. P., & Rho, V. (2015). A common-sense conceptual categorization system integrating heterogeneous proxytypes and the dual process of reasoning. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 875–881), Buenos Aires, July 2015. AAAI Press.Google Scholar
  43. Lieto, Antonio, Radicioni, Daniele P., & Rho, Valentina. (2017). Dual PECCS: A cognitive system for conceptual representation and categorization. Journal of Experimental and Theoretical Artificial Intelligence, 29(2), 433–452.CrossRefGoogle Scholar
  44. Lieto, A., Radicioni, D. P., Rho, V., & Mensa, E. (2017). Towards a unifying framework for conceptual represention and reasoning in cognitive systems. Intelligenza Artificiale, 11(2), 139–153.CrossRefGoogle Scholar
  45. Liu, H., & Singh, P. (2004). Conceptnet: A practical commonsense reasoning tool-kit. BT Technology Journal, 22(4), 211–226.CrossRefGoogle Scholar
  46. Marujo, L., Ribeiro, R., de Matos, D. M., Neto, J. P., Gershman, A., & Carbonell, J. (2012). Key phrase extraction of lightly filtered broadcast news. In Proceedings of 15th international conference on text, speech and dialogue (TSD 2012). Springer.Google Scholar
  47. McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., et al. (2012). Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4), 701–719.CrossRefGoogle Scholar
  48. Mensa, E., Radicioni, D. P., & Lieto, A. (2017). MeRaLi at Semeval-2017 task 2 subtask 1: A cognitively inspired approach. In Proceedings of the international workshop on semantic evaluation (SemEval 2017). Association for Computational Linguistics.Google Scholar
  49. Mikolov, T., Chen, K., Corrado, G., & Dean, J (2013). Efficient estimation of word representations in vector space. CoRR abs/1301.3781.Google Scholar
  50. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).Google Scholar
  51. Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRefGoogle Scholar
  52. Miller, G. A., & Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1), 1–28.CrossRefGoogle Scholar
  53. Miller, G. A., & Fellbaum, C. (2007). Wordnet then and now. Language Resources and Evaluation, 41(2), 209–214.CrossRefGoogle Scholar
  54. Mimno, D. M., Wallach, H. M., Talley, E. M., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. In EMNLP (pp. 262–272). ACL.Google Scholar
  55. Minsky, M. (2000). Commonsense-based interfaces. Communications of the ACM, 43(8), 66–73.CrossRefGoogle Scholar
  56. Moro, A., Cecconi, F., & Navigli, R. (2014). Multilingual word sense disambiguation and entity linking for everybody. In Proceedings of the 2014 international conference on posters and demonstrations track (Vol. 1272, pp. 25–28). CEUR-WS. org.Google Scholar
  57. Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2), 10.CrossRefGoogle Scholar
  58. Navigli, R., & Ponzetto, S. P. (2010). BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 216–225). Association for Computational Linguistics.Google Scholar
  59. Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250.CrossRefGoogle Scholar
  60. Newman, D., Noh, Y., Talley, E., Karimi, S., & Baldwin, T. (2010). Evaluating topic models for digital libraries. In The ACM/IEEE joint conference on digital libraries (JCDL2010), Gold Coast, Australia. ACM.Google Scholar
  61. Palmer, M., Babko-Malaya, O., & Dang, H. T. (2004). Different sense granularities for different applications. In Proceedings of workshop on scalable natural language understanding.Google Scholar
  62. Pedersen, T., Banerjee, S., & Patwardhan, S. (2005). Maximizing semantic relatedness to perform word sense disambiguation. University of Minnesota supercomputing institute research report UMSI, 25, 2005.Google Scholar
  63. Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). Wordnet:: Similarity: Measuring the relatedness of concepts. In Demonstration papers at HLT-NAACL 2004 (pp. 38–41). Association for Computational Linguistics.Google Scholar
  64. Pennington, Jeffrey, Socher, Richard, & Manning, Christopher D. (2014). Glove: Global Vectors for Word Representation. In EMNLP (Vol. 14, pp. 1532–1543).Google Scholar
  65. Pilehvar, M. T., & Navigli, R. (2015). From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artificial Intelligence, 228, 95–128.CrossRefGoogle Scholar
  66. Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007.Google Scholar
  67. Resnik, P. (1998). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11(1), 95–130.Google Scholar
  68. Richardson, R., Smeaton, A. F., & Murphy, J. (1994). Using wordnet as a knowledge base for measuring semantic similarity between words. In Proceedings of AICS conference (pp. 1–15).Google Scholar
  69. Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192–233.CrossRefGoogle Scholar
  70. Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10), 627–633.CrossRefGoogle Scholar
  71. Schwartz, H. A., & Gomez, F. (2008). Acquiring knowledge from the web to be used as selectors for noun sense disambiguation. In Proceedings of the twelfth conference on computational natural language learning (pp. 105–112). ACL.Google Scholar
  72. Schwartz, H. A., & Gomez, F.. (2011). Evaluating semantic metrics on tasks of concept similarity. In Proceedings of the international florida artificial intelligence research society conference (FLAIRS) (p. 324).Google Scholar
  73. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.CrossRefGoogle Scholar
  74. Speer, R., & Chin, J. (2016). An ensemble method to produce high-quality word embeddings. arXiv preprint arXiv:1604.01692.
  75. Speer, R., Chin, J., & Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. In AAAI (pp. 4444–4451).Google Scholar
  76. Speer, R., & Havasi, C. (2012). Representing general relational Knowledge in ConceptNet 5. In LREC (pp. 3679–3686).Google Scholar
  77. Speer, R., & Lowry-Duda, J. (2017). Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge. CoRR abs/1704.03560.Google Scholar
  78. Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), 379–416.CrossRefGoogle Scholar
  79. Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327.CrossRefGoogle Scholar
  80. Vossen, P., & Fellbaum, C (2009). Multilingual framenets in computational lexicography: Methods and applications, chapter Universals and idiosyncrasies in multilingual WordNets. Trends in linguistics/Studies and monographs: Studies and monographs. Mouton de Gruyter.Google Scholar
  81. Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on association for computational linguistics (pp. 133–138). ACL.Google Scholar
  82. Yampolskiy, R. (2013). Turing test as a defining feature of ai-completeness. In Artificial intelligence, evolutionary computing and metaheuristics (pp. 3–17).Google Scholar
  83. Yarlett, D., & Ramscar, M. (2008). Language learning through similarity-based generalization. Unpublished Ph.D. thesis, Stanford University.Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  • Enrico Mensa
    • 1
  • Daniele P. Radicioni
    • 1
    Email author
  • Antonio Lieto
    • 1
  1. 1.Computer Science DepartmentUniversity of TurinTurinItaly

Personalised recommendations