Advertisement

Language Resources and Evaluation

, Volume 48, Issue 2, pp 373–393 | Cite as

ECO and Onto.PT: a flexible approach for creating a Portuguese wordnet automatically

  • Hugo Gonçalo Oliveira
  • Paulo Gomes
Original Paper

Abstract

A wordnet is an important tool for developing natural language processing applications for a language. However, most wordnets are handcrafted by experts, which limits their growth. In this article, we propose an automatic approach to create wordnets by exploiting textual resources, dubbed ECO. After extracting semantic relation instances, identified by discriminating textual patterns, ECO discovers synonymy clusters, used as synsets, and attaches the remaining relations to suitable synsets. Besides introducing each step of ECO, we report on how it was implemented to create Onto.PT, a public lexical ontology for Portuguese. Onto.PT is the result of the automatic exploitation of Portuguese dictionaries and thesauri, and it aims to minimise the main limitations of existing Portuguese lexical knowledge bases.

Keywords

Information extraction Lexical ontology Wordnet Clustering Semantic relations 

Notes

Acknowledgments

The work described here was developed in the scope of Hugo Gonçalo Oliveira’s PhD dissertation, conducted at CISUC, University of Coimbra, under the supervision of Paulo Gomes, and supported by the FCT scholarship grant SFRH/BD/ 44955/2008, co-funded by FSE.

References

  1. Agichtein, E., & Gravano, L. (2000). Snowball: Extracting relations from large plain-text collections. In Proceedings of 5th ACM international conference on digital libraries (pp. 85–94). ACM, New York, NY, USA.Google Scholar
  2. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings human language technologies: 2009 annual conference of the North American Chapter of ACL (NAACL-HLT) (pp. 19–27). ACL Press, Stroudsburg, PA, USA.Google Scholar
  3. Amsler, R. A. (1981). A taxonomy for English nouns and verbs. In Proceedings of 19th annual meeting on association for computational linguistics (pp. 133–138). ACL Press, Morristown, NJ, USA ACL’81.Google Scholar
  4. Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open information extraction from the web. In M. M. Veloso (Ed.) Proceedings of the international joint conference on artificial intelligence (pp. 2670–2676). IJCAI 2007.Google Scholar
  5. Bellare, K., Sharma, A. D., Sharma, A. D., Loiwal, N., & Bhattacharyya, P. (2004). Generic text summarization using wordnet. In Proceedings of 4th international conference on language resources and evaluation (pp. 691–694). ELRA, Barcelona, Spain, LREC 2004.Google Scholar
  6. Calzolari, N., Pecchia, L., & Zampolli, A. (1973). Working on the italian machine dictionary: A semantic approach. In Proceedings of 5th conference on computational linguistics (pp. 49–52). ACL Press, Morristown, NJ, USA, COLING’73.Google Scholar
  7. Caraballo, S. A. (1999). Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of 37th annual meeting of the association for computational linguistics (pp. 120–126). ACL Press, Morristown, NJ, USA.Google Scholar
  8. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E. R., & Mitchell, T. M. (2010). Toward an architecture for never-ending language learning. In Proceedings of 24th conference on artificial intelligence. AAAI Press, AAAI 2010.Google Scholar
  9. Chodorow, M. S., Byrd, R. J., & Heidorn, G. E. (1985). Extracting semantic hierarchies from a large on-line dictionary. In Proceedings of the 23rd annual meeting on Association for Computational Linguistics (pp. 299–304). ACL Press, Morristown, NJ, USA, ACL’85.Google Scholar
  10. de Melo, G., & Weikum, G. (2008). On the utility of automatically generated wordnets. In Proceedings of 4th global WordNet conference, University of Szeged (pp. 147–161). Szeged, Hungary, GWC 2008.Google Scholar
  11. de Paiva, V., Rademaker, A., & de Melo, G. (2012). Openwordnet-pt: An open brazilian wordnet for reasoning. In Proceedings of the 24th international conference on computational linguistics. COLING (Demo Paper).Google Scholar
  12. Dias-da-Silva, B. C. (2006). WordNet.br: An exercise of human language technology research. In: Petr Sojka, C. F., Key-Sun Choi, & P. Vossen (Eds.), Proceedings of the 3rd International WordNet Conference, South Jeju Island (pp. 301–303). Korea, GWC 2006.Google Scholar
  13. Dias-Da-Silva, B. C., & de Moraes, H. R. (2003). A construção de um thesaurus eletrônico para o português do Brasil. ALFA 47(2),101–115.Google Scholar
  14. Dolan, W. B. (1994). Word sense ambiguation: Clustering related senses. In Proceedings of 15th international conference on computational linguistics (pp. 712–716). ACL Press, Morristown, NJ, USA, COLING’94.Google Scholar
  15. Fader, A., Soderland, S., & Etzioni, O. (2011). Identifying relations for open information extraction. In Proceedings of the conference of empirical methods in natural language processing. ACL Press, Edinburgh, Scotland, UK, EMNLP 2011.Google Scholar
  16. Fellbaum, C. (Ed.) (1998). WordNet: An electronic lexical database (Language, Speech, and Communication). The MIT Press, Cambridge, MA.Google Scholar
  17. Gangemi, A., Guarino, N., Masolo, C., & Oltramari, A. (2010). Interfacing WordNet with DOLCE: Towards OntoWordNet. In Ontology and the Lexicon: A natural language processing perspective, studies in natural language processing, chap 3 (pp. 36–52). Cambridge University PressGoogle Scholar
  18. Gfeller, D., Chappelier, J. C., & Rios, P. D. L. (2005). Synonym Dictionary Improvement through Markov Clustering and Clustering Stability. In Proceedings of international symposium on applied stochastic models and data analysis (pp. 106–113). ASMDA 2005.Google Scholar
  19. Gomes, P., Pereira, F. C., Paiva, P., Seco, N., Carreiro, P., Ferreira, J. L., & Bento, C. (2003). Noun sense disambiguation with WordNet for software design retrieval. In Proceedings of advances in artificial intelligence, 16th conference of the Canadian society for computational studies of intelligence (pp. 537–543). Halifax, Canada.Google Scholar
  20. Gonçalo Oliveira, H., & Gomes, P. (2011). Automatic discovery of fuzzy Synsets from Dictionary Definitions. In Proceedings of 22nd International Joint Conference on Artificial Intelligence. IJCAI/AAAI, Barcelona, Spain, IJCAI 2011, pp 1801–1806.Google Scholar
  21. Gonçalo Oliveira, H., & Gomes, P. (2012). Ontologising semantic relations into a relationless thesaurus. In Proceedings of 20th European conference on artificial intelligence (ECAI 2012) (pp. 915–916). IOS Press, Montpellier, France.Google Scholar
  22. Gonçalo Oliveira, H., & Gomes, P. (2013). Towards the automatic enrichment of a thesaurus with information in dictionaries. Expert Systems: The Journal of Knowledge Engineering (KDBI special issue) in press, http://dx.doi.org/10.1111/exsy.12029.
  23. Gonçalo Oliveira, H., Santos, D., & Gomes, P. (2009). Relations extracted from a Portuguese dictionary: results and first evaluation. In Proceedings of 14th Portuguese conference on artificial intelligence (EPIA) (pp. 541–552). APPIA, EPIA 2009.Google Scholar
  24. Gonçalo Oliveira, H., Santos, D., & Gomes, P. (2010). Extracção de relações semânticas entre palavras a partir de um dicionário: o PAPEL e sua avaliação. Linguamática 2(1), 77–93.Google Scholar
  25. Gonçalo Oliveira, H., Antón Pérez, L., Costa, H., & Gomes, P. (2011) Uma rede léxico-semântica de grandes dimensões para o português, extraída a partir de dicionários electrónicos. Linguamática 3(2), 23–38.Google Scholar
  26. Gonçalo Oliveira, H., Antón Pérez, L., & Gomes, P. (2012). Integrating lexical-semantic knowledge to build a public lexical ontology for Portuguese. In Natural language Processing and Information Systems, Proceedings of 17h international conference on applications of Natural language to Information Systems (NLDB) (Vol. 7337) (pp. 210–215). Springer, Groningen, The Netherlands, LNCS.Google Scholar
  27. Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C. M., & Wirth, C. (2012). UBY—a large-scale unified lexical-semantic resource. In Proceedings of the 13th conference of the European chapter of the association for computational linguistics (pp. 580–590). ACL Press, Avignon, France, EACL 2012.Google Scholar
  28. Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of 14th conference on computational linguistics (pp. 539–545). ACL Press, Morristown, NJ, USA, COLING 92.Google Scholar
  29. Hemayati, R., Meng, W., & Yu, C. (2007). Semantic-based grouping of search engine results using wordnet. In Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management Conference on Advances in Data and Web Management (pp. 678–686). Springer, APWeb/WAIM’07.Google Scholar
  30. Henrich, V., Hinrichs, E., & Vodolazova, T. (2011). Semi-automatic extension of germanet with sense definitions from wiktionary. In Proceedings of 5th language & technology conference (pp. 126–130). Poznan, Poland, LTC 2011.Google Scholar
  31. Hirst, G. (2004). Ontology and the lexicon. In: S. Staab, R. Studer (Eds.), Handbook on ontologies, international handbooks on information systems (pp. 209–230). Springer, Berlin.Google Scholar
  32. Hoffart, J., Suchanek, F. M., Berberich, K., Lewis-Kelham, E., de Melo, G., & Weikum, G. (2011). Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In Proceedings of the 20th international conference on World Wide Web (Companion Volume) (pp. 229–232). Hyderabad, India, WWW 2011.Google Scholar
  33. Ide, N., & Véronis, J. (1995). Knowledge extraction from machine-readable dictionaries: An evaluation. In Machine translation and the lexicon (Vol. 898). Springer, LNAI.Google Scholar
  34. Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 17th international conference on Computational linguistics (pp. 768–774). ACL Press, Montreal, Quebec, Canada, COLING’98.Google Scholar
  35. Marrafa, P. (2002). Portuguese Wordnet: General architecture and internal semantic relations. DELTA 18, 131–146.CrossRefGoogle Scholar
  36. Maziero, E. G., Pardo, T. A. S., Felippo, A. D., & Dias-da-Silva, B. C. (2008). A Base de Dados Lexical e a Interface Web do TeP 2.0—Thesaurus Eletrônico para o Português do Brasil. In VI Workshop em Tecnologia da Informação e da Linguagem Humana (TIL) (pp 390–392).Google Scholar
  37. Navarro, E., Sajous, F., Gaume, B., Prévot, L., Hsieh, S., Kuo, T. Y., Magistry, P., & Huang, C. R. (2009). Wiktionary and NLP: Improving synonymy networks. In Proceedings of workshop on the people’s Web Meets NLP: Collaboratively constructed semantic resources (pp. 19–27). ACL Press, Suntec, Singapore.Google Scholar
  38. Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250.CrossRefGoogle Scholar
  39. Navigli, R., & Velardi, P. (2003). An analysis of ontology-based query expansion strategies. In Proceedings of the ECML 2003 workshop on adaptive text extraction and mining (ATEM) in the 14th European conference on machine learning (pp. 42–49). Cavtat-Dubrovnik, Croatia.Google Scholar
  40. Nichols, E., Bond, F., & Flickinger, D. (2005). Robust ontology acquisition from machine-readable dictionaries. In Proceedings of 19th international joint conference on artificial intelligence (pp. 1111–1116). Professional Book Center, IJCAI 2005.Google Scholar
  41. Pantel, P. (2005). Inducing ontological co-occurrence vectors. In Proceedings of 43rd annual meeting of the association for computational linguistics (pp. 125–132). ACL Press, ACL 2005.Google Scholar
  42. Pantel, P., & Pennacchiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (pp. 113–120). ACL Press, Sydney, Australia.Google Scholar
  43. Pasca, M., & Harabagiu, S. M. (2001). The informative role of WordNet in open-domain question answering. In Proceedings of NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications (pp. 138–143). Extensions and Customizations, Pittsburgh, USA.Google Scholar
  44. Pease, A., & Fellbaum, C. (2010). Formal ontology as interlingua: The SUMO and WordNet linking project and global WordNet linking project, chap 2. In Ontology and the Lexicon: A natural language processing perspective, studies in natural language processing (pp. 25–35). Cambridge University Press.Google Scholar
  45. Pennacchiotti, M., & Pantel, P. (2006). Ontologizing semantic relations. In Proceedings of 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (pp. 793–800). ACL Press, COLING/ACL 2006.Google Scholar
  46. Peters, W., Peters, I., & Vossen, P. (1998). Automatic Sense Clustering in EuroWordNet. In Proceedings of 1st international conference on language resources and evaluation (pp. 409–416). Granada, LREC’98.Google Scholar
  47. Prévot, L., Huang, C.-R., Calzolari, N., Gangemi, A., Lenci, A., & Oltramari, A. (2010). Ontology and the lexicon: A multi-disciplinary perspective (introduction) chap 1. In C. R. Huang, N. Calzolari, A. Gangemi, A. Lenci, A. Oltramari, & L. Prévot (Eds.), Ontology and the Lexicon: A Natural Language Processing Perspective, Studies in Natural Language Processing (pp. 3–24), Cambridge University Press, Cambridge, MA.Google Scholar
  48. Richardson, S. D., Dolan, W. B., & Vanderwende, L. (1998). Mindnet: Acquiring and structuring semantic information from text. In Proceedings of 17th international conference on computational linguistics (pp. 1098–1102). COLING’98.Google Scholar
  49. Rodrigues, R., Gonçalo Oliveira, H., & Gomes, P. (2012). Uma abordagem ao Págico baseada no processamento e análise de sintagmas dos tópicos. Linguamática 4(1), 31–39.Google Scholar
  50. Sampson, G. (2000). Review of Fellbaum (1998). International Journal of Lexicography 13(1), 54–59.CrossRefGoogle Scholar
  51. Santos, D., & Bick, E. (2000). Providing Internet access to Portuguese corpora: the AC/DC project. In Proceedings of 2nd international conference on language resources and evaluation (pp. 205–210). LREC 2000.Google Scholar
  52. Santos, D., Barreiro, A., Freitas, C., Gonçalo Oliveira, H., Medeiros, J. C., Costa, L., Gomes, P., & Silva, R. (2010). Relações semânticas em português: comparando o TeP, o MWN.PT, o Port4NooJ e o PAPEL. In: Textos seleccionados. XXV Encontro Nacional da Associação Portuguesa de Linguística (pp. 681–700). APL, Lisboa, Portugal.Google Scholar
  53. Shi, L., & Mihalcea, R. (2005). Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In Computational Linguistics and Intelligent Text Processing (CICLing 2005) (Vol. 3406) (pp. 100–111). Springer, LNCS.Google Scholar
  54. Simões, A., Sanromán, A. I., & ao Almeida, J. J. (2012). Dicionário-aberto: A source of resources for the portuguese language processing. In Proceedings of computational processing of the Portuguese language, 10th international conference (PROPOR 2012) (Vol.7243) (pp.121–127) bra Portugal, Springer, LNCS.Google Scholar
  55. Snow, R., Jurafsky, D., & Ng, A. Y. (2005). Learning syntactic patterns for automatic hypernym discovery. In Advances in neural information processing systems (pp 1297–1304). MIT Press, Cambridge, MA.Google Scholar
  56. Stamou, S., Oflazer, K., Pala, K., Christoudoulakis, D., Cristea, D., Tufis, D., Koeva, S., Totkov, G., Dutoit, D., & Grigoriadou, M. (2002). BalkaNet: A multilingual semantic network for the balkan languages. In: Proceeding 1st Global WordNet Conference. GWC’02.Google Scholar
  57. Suchanek, F. M., Kasneci, G., & Weikum, G. (2007). Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web (pp. 697–706). ACM Press, Alberta, Canada, WWW 2007.Google Scholar
  58. Turney, P. D. (2001). Mining the web for synonyms: PMI–IR versus LSA on TOEFL. In Proceedings of 12th European conference on machine learning (Vol. 2167) (pp. 491–502). ECML 2001, Springer, LNCS.Google Scholar
  59. van Assem, M., Gangemi, A., & Schreiber, G. (2006). RDF/OWL representation of WordNet. W3c working draft, World Wide Web Consortium, http://www.w3.org/TR/2006/WD-wordnet-rdf-20060619/.
  60. Vossen, P (1998). Introduction to euro wordnet. Computers and the Humanities 32(2), 73–89.CrossRefGoogle Scholar
  61. Zesch, T., Müller, C., & Gurevych, I. (2008). Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of 6th international conference on language resources and evaluation. Marrakech, Morocco, LREC 2008.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.CISUC, Departamento de Engenharia Informática, Faculdade de Ciências e TecnologiaUniversidade de CoimbraCoimbraPortugal

Personalised recommendations