Advertisement

Automatic Acquisition of Wordnet Relations by Distributionally Supported Morphological Patterns Extracted from Polish Corpora

  • Roman Kurc
  • Maciej Piasecki
  • Stan Szpakowicz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6231)

Abstract

Espresso is a pattern-based algorithm of extracting lexical-semantic relations, defined for English. We present its adaptation to Polish. We consider not only the technicalities such as the availability of language-processing tools for Polish, but also pattern structures which leverage the specificity of a strongly inflected language. We propose a new method of computing the reliability measure of extraction; this leads to a modified algorithm which we have named Estratto. In this paper we investigate the influence of additional lexico-semantic data and information from generic patterns.

Keywords

lexical-semantic relations pattern-based relation extraction Espresso Estratto wordnet expansion 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proc. 21st COLING and 44th ACL, pp. 113–120. ACL (2006)Google Scholar
  2. 2.
    Piasecki, M., Szpakowicz, S., Broda, B.: Automatic selection of heterogeneous syntactic features in semantic similarity of Polish nouns. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 99–106. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  3. 3.
    Broda, B., Derwojedowa, M., Piasecki, M., Szpakowicz, S.: Corpus-based semantic relatedness for the construction of polish wordnet. In: ELRA (ed.) Proc. Sixth LREC 2008, Marrakech, Morocco. ELDA (May 2008)Google Scholar
  4. 4.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceeedings of COLING 1992, Nantes, France, pp. 539–545. ACL (1992)Google Scholar
  5. 5.
    Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław (2009)Google Scholar
  6. 6.
    Pantel, P., Ravichandran, D.: Automatically labeling semantic classes. In: Susan Dumais, D.M., Roukos, S. (eds.) HLT-NAACL 2004: Main Proceedings, Boston, Massachusetts, USA, pp. 321–328. ACL (May 2004)Google Scholar
  7. 7.
    Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)Google Scholar
  8. 8.
    Kurc, R., Piasecki, M.: Automatic acquisition of wordnet relations by the morpho-syntactic patterns extracted from the corpora in Polish. In: Proc. of IMCSIT – 3nd Inter. Symp. Advances in Artificial Intelligence and Applications, pp. 181–188 (2008)Google Scholar
  9. 9.
    McIntosh, T., Curran, J.R.: Reducing semantic drift with bagging and distributional similarity. In: Proc. 47th ACL and the 4th Inter. Joint Conf. on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 396–404. ACL (2009)Google Scholar
  10. 10.
    Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: Proc. Workshop on Linguistic Distances, COLING 2006, Sydney, Australia, pp. 16–24. ACL (July 2006)Google Scholar
  11. 11.
    Piasecki, M., Szpakowicz, S., Broda, B.: Extended similarity test for the evaluation of semantic similarity functions. In: Vetulani, Z. (ed.) Proc. 3rd Language and Technology Conference, Poznań, Wyd. Poznańskie Sp. z o.o., pp. 104–108 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Roman Kurc
    • 1
  • Maciej Piasecki
    • 1
  • Stan Szpakowicz
    • 2
    • 3
  1. 1.Institute of InformaticsWrocław University of TechnologyPoland
  2. 2.SITEUniversity of OttawaCanada
  3. 3.Institute of Computer SciencePolish Academy of SciencesPoland

Personalised recommendations