Advertisement

Whole-Part Relations Rule-Based Automatic Identification: Issues from Fine-Grained Error Analysis

  • Ilia Markov
  • Nuno Mamede
  • Jorge Baptista
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8856)

Abstract

In this paper, we focus on the most frequent errors that occurred during the implementation of a rule-based module for semantic relations extraction, which has been integrated in STRING, a hybrid statistical and rule-based Natural Language Processing chain for Portuguese. We focus on whole-part relations (meronymy), that is, a semantic relation between an entity that is perceived as a constituent part of another entity, or a member of a set. In this case, we target the type of meronymy involving human entities and body-part nouns. We describe with some detail the decisions that were made in order to overcome the errors produced by the system and the solutions adopted to improve its performance.

Keywords

whole-part relation meronymy body-part noun Portuguese error analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ait-Mokhtar, S., Chanod, J., Roux, C.: Robustness beyond shallowness: incremental dependency parsing. Natural Language Engineering 8(2/3), 121–144 (2002)Google Scholar
  2. 2.
    Bick, E.: The Parsing System ”Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Ph.D. thesis, Aarhus Univ. Aarhus, Denmark: Aarhus Univ. Press (2000)Google Scholar
  3. 3.
    Costa, F., Branco, A.: LXGram: A Deep Linguistic Processing Grammar for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 86–89. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  5. 5.
    Fleiss, J.L.: Statistical methods for rates and proportions, 2nd edn., pp. 38–46. John Wiley, New York (1981)zbMATHGoogle Scholar
  6. 6.
    Freelon, D.: ReCal: Intercoder Reliability Calculation as a Web Service. Intl. J. of Internet Science 5(1), 20–33 (2010)Google Scholar
  7. 7.
    Freitas, C.: ESQUELETO - Anotaçã das palavras do corpo humano. Tech. Rep. Versão 5 (May 20, 2014), http://www.linguateca.pt/acesso/Esqueleto.pdf
  8. 8.
    Gelbukh, A.: Syntactic disambiguation with weighted extended subcategorization frames. In: Proceedings of PACLING-99, Pacific Association for Computational Linguistics, pp. 244–249. University of Waterloo, Canada (1999)Google Scholar
  9. 9.
    Gelbukh, A.: Unsupervised Learning for Syntactic Disambiguation. Computación y Sistemas 18(2), 329–344 (2014)CrossRefGoogle Scholar
  10. 10.
    Girju, R., Badulescu, A., Moldovan, D.: Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations. In: Proceedings of HLT-NAACL, vol. 3, pp. 80–87 (2003)Google Scholar
  11. 11.
    Girju, R., Badulescu, A., Moldovan, D.: Automatic discovery of part-whole relations. Computational Linguistics 21(1), 83–135 (2006)Google Scholar
  12. 12.
    van Hage, W.R., Kolb, H., Schreiber, G.: A method for learning part-whole relations. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 723–735. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conf. on Computational Linguistics, COLING 1992, vol. 2, pp. 539–545. ACL Morristown, NJ (1992)Google Scholar
  14. 14.
    Hirst, G.: Ontology and the lexicon. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies, pp. 209–230. Springer (2004)Google Scholar
  15. 15.
    Iris, M., Litowitz, B., Evens, M.: Problems of the Part-Whole Relation. In: Evens, M. (ed.) Relational Models of the Lexicon: Representing Knowledge in Semantic Networks, pp. 261–288. Cambridge Univ. Press (1988)Google Scholar
  16. 16.
    Landis, J., Koch, G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Mamede, N., Baptista, J., Diniz, C., Cabarrão, V.: STRING: An Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese. In: Computational Processing of Portuguese, PROPOR 2012, vol. Demo Session (2012), http://www.propor2012.org/demos/DemoSTRING.pdf
  18. 18.
    Marques, J.: Anaphora Resolution. Master’s thesis, Univ. of Lisbon/IST and INESC-ID Lisboa/L2F (2013)Google Scholar
  19. 19.
    Marrafa, P.: WordNet do Português: uma base de dados de conhecimento linguístico. Instituto Camões (2001)Google Scholar
  20. 20.
    Marrafa, P.: Portuguese WordNet: general architecture and internal semantic relations. DELTA 18, 131–146 (2002)CrossRefGoogle Scholar
  21. 21.
    Marrafa, P., Amaro, R., Mendes, S.: WordNet.PT Global – extending WordNet.PT to Portuguese varieties. In: Proceedings of the 1st Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties, pp. 70–74. ACL Press, Edinburgh (2011)Google Scholar
  22. 22.
    Oliveira, H.: Onto.PT: Towards the Automatic Construction of a Lexical Ontology for Portuguese. Ph.D. thesis, Univ. of Coimbra/Faculty of Science and Technology (2012)Google Scholar
  23. 23.
    Oliveira, H.G., Santos, D., Gomes, P., Seco, N.: PAPEL: A Dictionary-Based Lexical Ontology for Portuguese. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 31–40. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  24. 24.
    Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of Conf. on Computational Linguistics/ACL, COLING/ACL 2006, pp. 113–120. Sydney, Australia (2006)Google Scholar
  25. 25.
    Pianta, E., Bentivogli, L., Girardi, C.: MultiWordNet: developing an aligned multilingual database. In: 1st Intl. Conf. on Global WordNet, Mysore, India, pp. 293–302 (2002)Google Scholar
  26. 26.
    Prévot, L., Huang, C., Calzolari, N., Gangemi, A., Lenci, A., Oltramari, A.: Ontology and the lexicon: a multi-disciplinary perspective (introduction). In: Huang, C., Calzolari, N., Gangemi, A., Lenci, A., Oltramari, A., Prévot, L. (eds.) Ontology and the Lexicon: A Natural Language Processing Perspective. Studies in Natural Language Processing, ch. 1, pp. 3–24. Cambridge Univ. Press (2010)Google Scholar
  27. 27.
    Rocha, P., Santos, D.: CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa. In: Nunes, M. (ed.) V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR 2000), pp. 131–140. São Paulo, ICMC/USP (2000)Google Scholar
  28. 28.
    Sidorov, G.: Non-continuous Syntactic N-grams. Polibits 48, 67–75 (2013)Google Scholar
  29. 29.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernandez, L.: Syntactic N-grams as Machine Learning Features for Natural Language Processing. Expert Systems with Applications 41(3), 853–860 (2013)CrossRefGoogle Scholar
  30. 30.
    Winston, M., Chaffin, R., Herrmann, D.: A Taxonomy of Part-Whole Relations. Cognitive Science 11, 417–444 (1987)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ilia Markov
    • 1
  • Nuno Mamede
    • 2
    • 3
  • Jorge Baptista
    • 3
    • 4
  1. 1.Centro de Investigación en Computación (CIC)Instituto Politécnico Nacional (IPN)México D.F.Mexico
  2. 2.Universidade do Algarve/FCHS and CECLFaroPortugal
  3. 3.Spoken Language LabINESC-ID Lisboa/L2FLisboaPortugal
  4. 4.Universidade de Lisboa/ISTLisboaPortugal

Personalised recommendations