Skip to main content
Log in

ECO and Onto.PT: a flexible approach for creating a Portuguese wordnet automatically

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

A wordnet is an important tool for developing natural language processing applications for a language. However, most wordnets are handcrafted by experts, which limits their growth. In this article, we propose an automatic approach to create wordnets by exploiting textual resources, dubbed ECO. After extracting semantic relation instances, identified by discriminating textual patterns, ECO discovers synonymy clusters, used as synsets, and attaches the remaining relations to suitable synsets. Besides introducing each step of ECO, we report on how it was implemented to create Onto.PT, a public lexical ontology for Portuguese. Onto.PT is the result of the automatic exploitation of Portuguese dictionaries and thesauri, and it aims to minimise the main limitations of existing Portuguese lexical knowledge bases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. See http://mwnpt.di.fc.ul.pt/.

  2. Available from http://openthesaurus.caixamagica.pt/.

  3. Available from http://pt.wiktionary.org/.

  4. See http://www.linguateca.pt/PAPEL/.

  5. Available from http://ontopt.dei.uc.pt/index.php?sec=consultar.

  6. See website at http://www.globalwordnet.org/.

  7. Available from http://www.globalwordnet.org/gwa/ewn_to_bc/corebcs.html.

  8. A separate set with hypernymy was created because almost a half of the sb-triples in Onto.PT are hypernymy and it is also one of the most used semantic relations.

  9. Judges were advised to use online dictionaries, if needed.

References

  • Agichtein, E., & Gravano, L. (2000). Snowball: Extracting relations from large plain-text collections. In Proceedings of 5th ACM international conference on digital libraries (pp. 85–94). ACM, New York, NY, USA.

  • Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings human language technologies: 2009 annual conference of the North American Chapter of ACL (NAACL-HLT) (pp. 19–27). ACL Press, Stroudsburg, PA, USA.

  • Amsler, R. A. (1981). A taxonomy for English nouns and verbs. In Proceedings of 19th annual meeting on association for computational linguistics (pp. 133–138). ACL Press, Morristown, NJ, USA ACL’81.

  • Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open information extraction from the web. In M. M. Veloso (Ed.) Proceedings of the international joint conference on artificial intelligence (pp. 2670–2676). IJCAI 2007.

  • Bellare, K., Sharma, A. D., Sharma, A. D., Loiwal, N., & Bhattacharyya, P. (2004). Generic text summarization using wordnet. In Proceedings of 4th international conference on language resources and evaluation (pp. 691–694). ELRA, Barcelona, Spain, LREC 2004.

  • Calzolari, N., Pecchia, L., & Zampolli, A. (1973). Working on the italian machine dictionary: A semantic approach. In Proceedings of 5th conference on computational linguistics (pp. 49–52). ACL Press, Morristown, NJ, USA, COLING’73.

  • Caraballo, S. A. (1999). Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of 37th annual meeting of the association for computational linguistics (pp. 120–126). ACL Press, Morristown, NJ, USA.

  • Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E. R., & Mitchell, T. M. (2010). Toward an architecture for never-ending language learning. In Proceedings of 24th conference on artificial intelligence. AAAI Press, AAAI 2010.

  • Chodorow, M. S., Byrd, R. J., & Heidorn, G. E. (1985). Extracting semantic hierarchies from a large on-line dictionary. In Proceedings of the 23rd annual meeting on Association for Computational Linguistics (pp. 299–304). ACL Press, Morristown, NJ, USA, ACL’85.

  • de Melo, G., & Weikum, G. (2008). On the utility of automatically generated wordnets. In Proceedings of 4th global WordNet conference, University of Szeged (pp. 147–161). Szeged, Hungary, GWC 2008.

  • de Paiva, V., Rademaker, A., & de Melo, G. (2012). Openwordnet-pt: An open brazilian wordnet for reasoning. In Proceedings of the 24th international conference on computational linguistics. COLING (Demo Paper).

  • Dias-da-Silva, B. C. (2006). WordNet.br: An exercise of human language technology research. In: Petr Sojka, C. F., Key-Sun Choi, & P. Vossen (Eds.), Proceedings of the 3rd International WordNet Conference, South Jeju Island (pp. 301–303). Korea, GWC 2006.

  • Dias-Da-Silva, B. C., & de Moraes, H. R. (2003). A construção de um thesaurus eletrônico para o português do Brasil. ALFA 47(2),101–115.

    Google Scholar 

  • Dolan, W. B. (1994). Word sense ambiguation: Clustering related senses. In Proceedings of 15th international conference on computational linguistics (pp. 712–716). ACL Press, Morristown, NJ, USA, COLING’94.

  • Fader, A., Soderland, S., & Etzioni, O. (2011). Identifying relations for open information extraction. In Proceedings of the conference of empirical methods in natural language processing. ACL Press, Edinburgh, Scotland, UK, EMNLP 2011.

  • Fellbaum, C. (Ed.) (1998). WordNet: An electronic lexical database (Language, Speech, and Communication). The MIT Press, Cambridge, MA.

    Google Scholar 

  • Gangemi, A., Guarino, N., Masolo, C., & Oltramari, A. (2010). Interfacing WordNet with DOLCE: Towards OntoWordNet. In Ontology and the Lexicon: A natural language processing perspective, studies in natural language processing, chap 3 (pp. 36–52). Cambridge University Press

  • Gfeller, D., Chappelier, J. C., & Rios, P. D. L. (2005). Synonym Dictionary Improvement through Markov Clustering and Clustering Stability. In Proceedings of international symposium on applied stochastic models and data analysis (pp. 106–113). ASMDA 2005.

  • Gomes, P., Pereira, F. C., Paiva, P., Seco, N., Carreiro, P., Ferreira, J. L., & Bento, C. (2003). Noun sense disambiguation with WordNet for software design retrieval. In Proceedings of advances in artificial intelligence, 16th conference of the Canadian society for computational studies of intelligence (pp. 537–543). Halifax, Canada.

  • Gonçalo Oliveira, H., & Gomes, P. (2011). Automatic discovery of fuzzy Synsets from Dictionary Definitions. In Proceedings of 22nd International Joint Conference on Artificial Intelligence. IJCAI/AAAI, Barcelona, Spain, IJCAI 2011, pp 1801–1806.

  • Gonçalo Oliveira, H., & Gomes, P. (2012). Ontologising semantic relations into a relationless thesaurus. In Proceedings of 20th European conference on artificial intelligence (ECAI 2012) (pp. 915–916). IOS Press, Montpellier, France.

  • Gonçalo Oliveira, H., & Gomes, P. (2013). Towards the automatic enrichment of a thesaurus with information in dictionaries. Expert Systems: The Journal of Knowledge Engineering (KDBI special issue) in press, http://dx.doi.org/10.1111/exsy.12029.

  • Gonçalo Oliveira, H., Santos, D., & Gomes, P. (2009). Relations extracted from a Portuguese dictionary: results and first evaluation. In Proceedings of 14th Portuguese conference on artificial intelligence (EPIA) (pp. 541–552). APPIA, EPIA 2009.

  • Gonçalo Oliveira, H., Santos, D., & Gomes, P. (2010). Extracção de relações semânticas entre palavras a partir de um dicionário: o PAPEL e sua avaliação. Linguamática 2(1), 77–93.

    Google Scholar 

  • Gonçalo Oliveira, H., Antón Pérez, L., Costa, H., & Gomes, P. (2011) Uma rede léxico-semântica de grandes dimensões para o português, extraída a partir de dicionários electrónicos. Linguamática 3(2), 23–38.

    Google Scholar 

  • Gonçalo Oliveira, H., Antón Pérez, L., & Gomes, P. (2012). Integrating lexical-semantic knowledge to build a public lexical ontology for Portuguese. In Natural language Processing and Information Systems, Proceedings of 17h international conference on applications of Natural language to Information Systems (NLDB) (Vol. 7337) (pp. 210–215). Springer, Groningen, The Netherlands, LNCS.

  • Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C. M., & Wirth, C. (2012). UBY—a large-scale unified lexical-semantic resource. In Proceedings of the 13th conference of the European chapter of the association for computational linguistics (pp. 580–590). ACL Press, Avignon, France, EACL 2012.

  • Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of 14th conference on computational linguistics (pp. 539–545). ACL Press, Morristown, NJ, USA, COLING 92.

  • Hemayati, R., Meng, W., & Yu, C. (2007). Semantic-based grouping of search engine results using wordnet. In Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management Conference on Advances in Data and Web Management (pp. 678–686). Springer, APWeb/WAIM’07.

  • Henrich, V., Hinrichs, E., & Vodolazova, T. (2011). Semi-automatic extension of germanet with sense definitions from wiktionary. In Proceedings of 5th language & technology conference (pp. 126–130). Poznan, Poland, LTC 2011.

  • Hirst, G. (2004). Ontology and the lexicon. In: S. Staab, R. Studer (Eds.), Handbook on ontologies, international handbooks on information systems (pp. 209–230). Springer, Berlin.

    Google Scholar 

  • Hoffart, J., Suchanek, F. M., Berberich, K., Lewis-Kelham, E., de Melo, G., & Weikum, G. (2011). Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In Proceedings of the 20th international conference on World Wide Web (Companion Volume) (pp. 229–232). Hyderabad, India, WWW 2011.

  • Ide, N., & Véronis, J. (1995). Knowledge extraction from machine-readable dictionaries: An evaluation. In Machine translation and the lexicon (Vol. 898). Springer, LNAI.

  • Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th international conference on Computational linguistics (pp. 768–774). ACL Press, Montreal, Quebec, Canada, COLING’98.

  • Marrafa, P. (2002). Portuguese Wordnet: General architecture and internal semantic relations. DELTA 18, 131–146.

    Article  Google Scholar 

  • Maziero, E. G., Pardo, T. A. S., Felippo, A. D., & Dias-da-Silva, B. C. (2008). A Base de Dados Lexical e a Interface Web do TeP 2.0—Thesaurus Eletrônico para o Português do Brasil. In VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL) (pp 390–392).

  • Navarro, E., Sajous, F., Gaume, B., Prévot, L., Hsieh, S., Kuo, T. Y., Magistry, P., & Huang, C. R. (2009). Wiktionary and NLP: Improving synonymy networks. In Proceedings of workshop on the people’s Web Meets NLP: Collaboratively constructed semantic resources (pp. 19–27). ACL Press, Suntec, Singapore.

  • Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250.

    Article  Google Scholar 

  • Navigli, R., & Velardi, P. (2003). An analysis of ontology-based query expansion strategies. In Proceedings of the ECML 2003 workshop on adaptive text extraction and mining (ATEM) in the 14th European conference on machine learning (pp. 42–49). Cavtat-Dubrovnik, Croatia.

  • Nichols, E., Bond, F., & Flickinger, D. (2005). Robust ontology acquisition from machine-readable dictionaries. In Proceedings of 19th international joint conference on artificial intelligence (pp. 1111–1116). Professional Book Center, IJCAI 2005.

  • Pantel, P. (2005). Inducing ontological co-occurrence vectors. In Proceedings of 43rd annual meeting of the association for computational linguistics (pp. 125–132). ACL Press, ACL 2005.

  • Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (pp. 113–120). ACL Press, Sydney, Australia.

  • Pasca, M., & Harabagiu, S. M. (2001). The informative role of WordNet in open-domain question answering. In Proceedings of NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications (pp. 138–143). Extensions and Customizations, Pittsburgh, USA.

  • Pease, A., & Fellbaum, C. (2010). Formal ontology as interlingua: The SUMO and WordNet linking project and global WordNet linking project, chap 2. In Ontology and the Lexicon: A natural language processing perspective, studies in natural language processing (pp. 25–35). Cambridge University Press.

  • Pennacchiotti, M., & Pantel, P. (2006). Ontologizing semantic relations. In Proceedings of 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (pp. 793–800). ACL Press, COLING/ACL 2006.

  • Peters, W., Peters, I., & Vossen, P. (1998). Automatic Sense Clustering in EuroWordNet. In Proceedings of 1st international conference on language resources and evaluation (pp. 409–416). Granada, LREC’98.

  • Prévot, L., Huang, C.-R., Calzolari, N., Gangemi, A., Lenci, A., & Oltramari, A. (2010). Ontology and the lexicon: A multi-disciplinary perspective (introduction) chap 1. In C. R. Huang, N. Calzolari, A. Gangemi, A. Lenci, A. Oltramari, & L. Prévot (Eds.), Ontology and the Lexicon: A Natural Language Processing Perspective, Studies in Natural Language Processing (pp. 3–24), Cambridge University Press, Cambridge, MA.

  • Richardson, S. D., Dolan, W. B., & Vanderwende, L. (1998). Mindnet: Acquiring and structuring semantic information from text. In Proceedings of 17th international conference on computational linguistics (pp. 1098–1102). COLING’98.

  • Rodrigues, R., Gonçalo Oliveira, H., & Gomes, P. (2012). Uma abordagem ao Págico baseada no processamento e análise de sintagmas dos tópicos. Linguamática 4(1), 31–39.

    Google Scholar 

  • Sampson, G. (2000). Review of Fellbaum (1998). International Journal of Lexicography 13(1), 54–59.

    Article  Google Scholar 

  • Santos, D., & Bick, E. (2000). Providing Internet access to Portuguese corpora: the AC/DC project. In Proceedings of 2nd international conference on language resources and evaluation (pp. 205–210). LREC 2000.

  • Santos, D., Barreiro, A., Freitas, C., Gonçalo Oliveira, H., Medeiros, J. C., Costa, L., Gomes, P., & Silva, R. (2010). Relações semânticas em português: comparando o TeP, o MWN.PT, o Port4NooJ e o PAPEL. In: Textos seleccionados. XXV Encontro Nacional da Associação Portuguesa de Linguística (pp. 681–700). APL, Lisboa, Portugal.

  • Shi, L., & Mihalcea, R. (2005). Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In Computational Linguistics and Intelligent Text Processing (CICLing 2005) (Vol. 3406) (pp. 100–111). Springer, LNCS.

  • Simões, A., Sanromán, A. I., & ao Almeida, J. J. (2012). Dicionário-aberto: A source of resources for the portuguese language processing. In Proceedings of computational processing of the Portuguese language, 10th international conference (PROPOR 2012) (Vol.7243) (pp.121–127) bra Portugal, Springer, LNCS.

  • Snow, R., Jurafsky, D., & Ng, A. Y. (2005). Learning syntactic patterns for automatic hypernym discovery. In Advances in neural information processing systems (pp 1297–1304). MIT Press, Cambridge, MA.

  • Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., & Grigoriadou, M. (2002). BalkaNet: A multilingual semantic network for the balkan languages. In: Proceeding 1st Global WordNet Conference. GWC’02.

  • Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web (pp. 697–706). ACM Press, Alberta, Canada, WWW 2007.

  • Turney, P. D. (2001). Mining the web for synonyms: PMI–IR versus LSA on TOEFL. In Proceedings of 12th European conference on machine learning (Vol. 2167) (pp. 491–502). ECML 2001, Springer, LNCS.

  • van Assem, M., Gangemi, A., & Schreiber, G. (2006). RDF/OWL representation of WordNet. W3c working draft, World Wide Web Consortium, http://www.w3.org/TR/2006/WD-wordnet-rdf-20060619/.

  • Vossen, P (1998). Introduction to euro wordnet. Computers and the Humanities 32(2), 73–89.

    Article  Google Scholar 

  • Zesch, T., Müller, C., & Gurevych, I. (2008). Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of 6th international conference on language resources and evaluation. Marrakech, Morocco, LREC 2008.

Download references

Acknowledgments

The work described here was developed in the scope of Hugo Gonçalo Oliveira’s PhD dissertation, conducted at CISUC, University of Coimbra, under the supervision of Paulo Gomes, and supported by the FCT scholarship grant SFRH/BD/ 44955/2008, co-funded by FSE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Gonçalo Oliveira.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gonçalo Oliveira, H., Gomes, P. ECO and Onto.PT: a flexible approach for creating a Portuguese wordnet automatically. Lang Resources & Evaluation 48, 373–393 (2014). https://doi.org/10.1007/s10579-013-9249-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-013-9249-9

Keywords

Navigation