Skip to main content

COVER: a linguistic resource combining common sense and lexicographic information

Abstract

Lexical resources are fundamental to tackle many tasks that are central to present and prospective research in Text Mining, Information Retrieval, and connected to Natural Language Processing. In this article we introduce COVER, a novel lexical resource, along with COVERAGE, the algorithm devised to build it. In order to describe concepts, COVER proposes a compact vectorial representation that combines the lexicographic precision characterizing BabelNet and the rich common-sense knowledge featuring ConceptNet. We propose COVER as a reliable and mature resource, that has been employed in as diverse tasks as conceptual categorization, keywords extraction, and conceptual similarity. The experimental assessment is performed on the last task: we report and discuss the obtained results, pointing out future improvements. We conclude that COVER can be directly exploited to build applications, and coupled with existing resources, as well.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    “When people communicate with each other, they rely on shared background knowledge to understand each other: knowledge about the way objects relate to each other in the world, people’s goals in their daily lives, the emotional content of events or situations. This ‘taken for granted’ information is what we call common sense—obvious things people normally know and usually leave unstated” (Cambria et al. 2010, p. 15).

  2. 2.

    The representational limitation of this ontological resource has also led to the development of hybrid knowledge representation systems, such as, e.g., \(\textsc {Dual{-}PECCS}\)  (Lieto et al. 2017a), that adopts OpenCyc to encode taxonomic information and resorts to different integrated frameworks the task of representing common-sense knowledge.

  3. 3.

    http://commoncrawl.org.

  4. 4.

    Of course, not all information available in ConceptNet can be directly mapped onto BSIs (e.g., the compound word “Something you find inside” has no counterpart in BabelNet/NASARI).

  5. 5.

    InstanceOf, RelatedTo, IsA, AtLocation, dbpedia/genre, Synonym, DerivedFrom, Causes, UsedFor, MotivatedByGoal, HasSubevent, Antonym, CapableOf, Desires, CausesDesire, PartOf, HasProperty, HasPrerequisite, MadeOf, CompoundDerivedFrom, HasFirstSubevent, dbpedia/field, dbpedia/knownFor, dbpedia/influencedBy, dbpedia/influenced, DefinedAs, HasA, MemberOf, ReceivesAction, SimilarTo, dbpedia/influenced, SymbolOf, HasContext, NotDesires, ObstructedBy, HasLastSubevent, NotUsedFor, NotCapableOf, DesireOf, NotHasProperty, CreatedBy, Attribute, Entails, LocationOfAction, LocatedNear.

  6. 6.

    http://corpus.byu.edu/full-text/.

  7. 7.

    The parameter \(\beta \) has been set to 2 to build the released resource.

  8. 8.

    Presently set to 0.6.

  9. 9.

    The parameters \(\alpha \) and \(\beta \) were set to .8 and .2 for the experimentation.

  10. 10.

    Publicly available at the URL http://www.seas.upenn.edu/~hansens/conceptSim/.

  11. 11.

    Namely, the 34 domains available in BabelDomains, http://lcl.uniroma1.it/babeldomains/.

References

  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of NAACL, NAACL ’09 (pp. 19–27). Association for Computational Linguistics.

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In The semantic web (pp. 722–735).

    Chapter  Google Scholar 

  3. Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley framenet project. In Proceedings of the 17th international conference on computational linguistics (Vol. 1, pp. 86–90). Association for Computational Linguistics.

  4. Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (Vol. 1, pp. 238–247).

  5. Bosco, C., Patti, V., & Bolioli, A. (2013). Developing corpora for sentiment analysis: The case of irony and Senti-TUT. IEEE Intelligent Systems, 28(2), 55–63.

    Article  Google Scholar 

  6. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguists, 32(1), 13–47.

    Article  Google Scholar 

  7. Camacho-Collados, J., Pilehvar, M. T., Collier, N., & Navigli, R. (2017). Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity. In Proceedings of the 11th international workshop on semantic evaluation (SemEval 2017), Vancouver, Canada.

  8. Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015). A unified multilingual semantic representation of concepts. In Proceedings of ACL, Beijing, China.

  9. Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015). NASARI: A novel approach to a semantically-aware representation of items. In Proceedings of NAACL (pp. 567–577).

  10. Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2016). NASARI: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence, 240, 36–64.

    Article  Google Scholar 

  11. Cambria, E., Schuller, B., Liu, B., Wang, H., & Havasi, C. (2013). Knowledge-based approaches to concept-level sentiment analysis. IEEE Intelligent Systems, 28(2), 12–14.

    Article  Google Scholar 

  12. Cambria, E., Speer, R., Havasi, C., & Hussain, A. (2010). Senticnet: A publicly available semantic resource for opinion mining. In AAAI fall symposium: Commonsense knowledge (Vol. 10).

  13. Ciaramita, M., & Johnson, M. (2003). Supersense tagging of unknown nouns in wordnet. In Proceedings of the 2003 conference on empirical methods in natural language processing (pp. 168–175). Association for Computational Linguistics.

  14. Colla, D., Mensa, E., & Radicioni, D. P. (2017). Semantic measures for keywords extraction. In AI*IA 2017: Advances in artificial intelligence. Lecture notes for artificial intelligence. Springer.

  15. Colla, D., Mensa, E., Radicioni, D. P., & Lieto, A. (2018). Tell me why: Computational explanation of conceptual similarity judgments. In Proceedings of the 17th international conference on information processing and management of uncertainty in knowledge-based systems (IPMU), special session on advances on explainable artificial intelligence, communications in computer and information science (CCIS). Springer, Cham.

    Google Scholar 

  16. Denecke, K. (2008). Using sentiwordnet for multilingual sentiment analysis. In IEEE 24th international conference on data engineering workshop, 2008. ICDEW 2008 (pp. 507–512). IEEE.

  17. Derrac, J., & Schockaert, S. (2015). Inducing semantic relations from conceptual spaces: A data-driven approach to plausible reasoning. Artificial Intelligence, 228, 66–94.

    Article  Google Scholar 

  18. Devitt, A., & Ahmad, K. (2013). Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Language Resources and Evaluation, 47(2), 475–511.

    Article  Google Scholar 

  19. Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166.

  20. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2001). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on world wide web (pp. 406–414). ACM.

  21. Francopoulo, G., Bel, N., George, M., Calzolari, N., Monachini, M., Pet, M., et al. (2009). Multilingual resources for NLP in the lexical markup framework (LMF). Language Resources and Evaluation, 43(1), 57–70.

    Article  Google Scholar 

  22. Ganitkevitch, J., Van Durme, B., & Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of NAACL-HLT (pp. 758–764).

  23. Gärdenfors, P. (2014). The geometry of meaning: Semantics based on conceptual spaces. Cambridge: MIT Press.

    Google Scholar 

  24. Gînscă, A.-L., Boroş, E., Iftene, A., Trandabăţ, D., Toader, M., Corîci, M., Perez, C.-A., & Cristea, D. (2011). Sentimatrix: Multilingual sentiment analysis service. In Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (pp. 189–195). Association for Computational Linguistics.

  25. Harabagiu, S., & Moldovan, D. (2003). Question answering. In The Oxford handbook of computational linguistics. Oxford University Press.

  26. Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.

    Article  Google Scholar 

  27. Havasi, C., Speer, R., & Alonso, J. (2007). ConceptNet: A lexical resource for common sense knowledge. In Recent advances in natural language processing V: Selected papers from RANLP (Vol. 309, p. 269).

    Google Scholar 

  28. Hovy, E. (2003). Text summarization. In The Oxford handbook of computational linguistics (2nd edn.). Oxford University Press.

  29. Jean-Louis, L., Zouaq, A., Gagnon, M., & Ensan, F. (2014). An assessment of online semantic annotators for the keyword extraction task. In Pacific Rim international conference on artificial intelligence (pp. 548–560). Springer.

  30. Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.

  31. Jimenez, S., Becerra, C., Gelbukh, A, Bátiz, A. J. D., & Mendizábal, A. (2013). Softcardinality-core: Improving text overlap with distributional measures for semantic textual similarity. In Proceedings of *SEM 2013 (Vol. 1, pp. 194–201).

  32. Langley, P. (2012). The cognitive systems paradigm. Advances in Cognitive Systems, 1, 3–13.

    Google Scholar 

  33. Leacock, C., Miller, G. A., & Chodorow, M. (1998). Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, 24(1), 147–165.

    Google Scholar 

  34. Lenat, D. B., Prakash, M., & Shepherd, M. (1985). CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. AI Magazine, 6(4), 65.

    Google Scholar 

  35. Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press.

    Google Scholar 

  36. Lieto, A., Minieri, A., Piana, A., Radicioni, D. P., & Frixione, M. (2014). A dual process architecture for ontology-based systems. In 6th international conference on knowledge engineering and ontology development, KEOD 2014 (pp. 48–55). INSTICC Press.

  37. Lieto, A., Lebiere, C., & Oltramari, A. (2018). The knowledge level in cognitive architectures: Current limitations and possible developments. Cognitive Systems Research, 48, 39–55.

    Article  Google Scholar 

  38. Lieto, A., Mensa, E., & Radicioni, D. P. (2016). A resource-driven approach for anchoring linguistic resources to conceptual spaces. In Proceedings of the XVth international conference of the italian association for artificial intelligence, Genova, Italy, November 29–December 1, 2016, volume 10037 of lecture notes in artificial intelligence (pp. 435–449). Springer.

  39. Lieto, A., Mensa, E., & Radicioni, D. P. (2016). Taming sense sparsity: A common-sense approach. In Proceedings of third Italian conference on computational linguistics (CLiC-it 2016) and fifth evaluation campaign of natural language processing and speech tools for Italian.

  40. Lieto, A., Minieri, A., Piana, A., & Radicioni, D. P. (2015). A knowledge-based system for prototypical reasoning. Connection Science, 27(2), 137–152.

    Article  Google Scholar 

  41. Lieto, A., & Radicioni, D. P. (2016). From human to artificial cognition and back: New perspectives on cognitively inspired ai systems. Cognitive Systems Research, 39, 1–3.

    Article  Google Scholar 

  42. Lieto, A., Radicioni, D. P., & Rho, V. (2015). A common-sense conceptual categorization system integrating heterogeneous proxytypes and the dual process of reasoning. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 875–881), Buenos Aires, July 2015. AAAI Press.

  43. Lieto, Antonio, Radicioni, Daniele P., & Rho, Valentina. (2017). Dual PECCS: A cognitive system for conceptual representation and categorization. Journal of Experimental and Theoretical Artificial Intelligence, 29(2), 433–452.

    Article  Google Scholar 

  44. Lieto, A., Radicioni, D. P., Rho, V., & Mensa, E. (2017). Towards a unifying framework for conceptual represention and reasoning in cognitive systems. Intelligenza Artificiale, 11(2), 139–153.

    Article  Google Scholar 

  45. Liu, H., & Singh, P. (2004). Conceptnet: A practical commonsense reasoning tool-kit. BT Technology Journal, 22(4), 211–226.

    Article  Google Scholar 

  46. Marujo, L., Ribeiro, R., de Matos, D. M., Neto, J. P., Gershman, A., & Carbonell, J. (2012). Key phrase extraction of lightly filtered broadcast news. In Proceedings of 15th international conference on text, speech and dialogue (TSD 2012). Springer.

  47. McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., et al. (2012). Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4), 701–719.

    Article  Google Scholar 

  48. Mensa, E., Radicioni, D. P., & Lieto, A. (2017). MeRaLi at Semeval-2017 task 2 subtask 1: A cognitively inspired approach. In Proceedings of the international workshop on semantic evaluation (SemEval 2017). Association for Computational Linguistics.

  49. Mikolov, T., Chen, K., Corrado, G., & Dean, J (2013). Efficient estimation of word representations in vector space. CoRR abs/1301.3781.

  50. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).

  51. Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  52. Miller, G. A., & Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1), 1–28.

    Article  Google Scholar 

  53. Miller, G. A., & Fellbaum, C. (2007). Wordnet then and now. Language Resources and Evaluation, 41(2), 209–214.

    Article  Google Scholar 

  54. Mimno, D. M., Wallach, H. M., Talley, E. M., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. In EMNLP (pp. 262–272). ACL.

  55. Minsky, M. (2000). Commonsense-based interfaces. Communications of the ACM, 43(8), 66–73.

    Article  Google Scholar 

  56. Moro, A., Cecconi, F., & Navigli, R. (2014). Multilingual word sense disambiguation and entity linking for everybody. In Proceedings of the 2014 international conference on posters and demonstrations track (Vol. 1272, pp. 25–28). CEUR-WS. org.

  57. Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2), 10.

    Article  Google Scholar 

  58. Navigli, R., & Ponzetto, S. P. (2010). BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 216–225). Association for Computational Linguistics.

  59. Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250.

    Article  Google Scholar 

  60. Newman, D., Noh, Y., Talley, E., Karimi, S., & Baldwin, T. (2010). Evaluating topic models for digital libraries. In The ACM/IEEE joint conference on digital libraries (JCDL2010), Gold Coast, Australia. ACM.

  61. Palmer, M., Babko-Malaya, O., & Dang, H. T. (2004). Different sense granularities for different applications. In Proceedings of workshop on scalable natural language understanding.

  62. Pedersen, T., Banerjee, S., & Patwardhan, S. (2005). Maximizing semantic relatedness to perform word sense disambiguation. University of Minnesota supercomputing institute research report UMSI, 25, 2005.

    Google Scholar 

  63. Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). Wordnet:: Similarity: Measuring the relatedness of concepts. In Demonstration papers at HLT-NAACL 2004 (pp. 38–41). Association for Computational Linguistics.

  64. Pennington, Jeffrey, Socher, Richard, & Manning, Christopher D. (2014). Glove: Global Vectors for Word Representation. In EMNLP (Vol. 14, pp. 1532–1543).

  65. Pilehvar, M. T., & Navigli, R. (2015). From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artificial Intelligence, 228, 95–128.

    Article  Google Scholar 

  66. Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007.

  67. Resnik, P. (1998). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11(1), 95–130.

    Google Scholar 

  68. Richardson, R., Smeaton, A. F., & Murphy, J. (1994). Using wordnet as a knowledge base for measuring semantic similarity between words. In Proceedings of AICS conference (pp. 1–15).

  69. Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192–233.

    Article  Google Scholar 

  70. Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10), 627–633.

    Article  Google Scholar 

  71. Schwartz, H. A., & Gomez, F. (2008). Acquiring knowledge from the web to be used as selectors for noun sense disambiguation. In Proceedings of the twelfth conference on computational natural language learning (pp. 105–112). ACL.

  72. Schwartz, H. A., & Gomez, F.. (2011). Evaluating semantic metrics on tasks of concept similarity. In Proceedings of the international florida artificial intelligence research society conference (FLAIRS) (p. 324).

  73. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.

    Article  Google Scholar 

  74. Speer, R., & Chin, J. (2016). An ensemble method to produce high-quality word embeddings. arXiv preprint arXiv:1604.01692.

  75. Speer, R., Chin, J., & Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. In AAAI (pp. 4444–4451).

  76. Speer, R., & Havasi, C. (2012). Representing general relational Knowledge in ConceptNet 5. In LREC (pp. 3679–3686).

  77. Speer, R., & Lowry-Duda, J. (2017). Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge. CoRR abs/1704.03560.

  78. Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), 379–416.

    Article  Google Scholar 

  79. Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327.

    Article  Google Scholar 

  80. Vossen, P., & Fellbaum, C (2009). Multilingual framenets in computational lexicography: Methods and applications, chapter Universals and idiosyncrasies in multilingual WordNets. Trends in linguistics/Studies and monographs: Studies and monographs. Mouton de Gruyter.

  81. Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on association for computational linguistics (pp. 133–138). ACL.

  82. Yampolskiy, R. (2013). Turing test as a defining feature of ai-completeness. In Artificial intelligence, evolutionary computing and metaheuristics (pp. 3–17).

    Google Scholar 

  83. Yarlett, D., & Ramscar, M. (2008). Language learning through similarity-based generalization. Unpublished Ph.D. thesis, Stanford University.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Daniele P. Radicioni.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mensa, E., Radicioni, D.P. & Lieto, A. COVER: a linguistic resource combining common sense and lexicographic information. Lang Resources & Evaluation 52, 921–948 (2018). https://doi.org/10.1007/s10579-018-9417-z

Download citation

Keywords

  • Lexical resources
  • Lexical semantics
  • Common sense knowledge
  • Vector representation
  • Concept similarity
  • NLP