Skip to main content

Comparing Window and Syntax Based Strategies for Semantic Extraction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5190))

Abstract

In this paper, we describe and compare two different approaches for extracting similar words from large corpora. In particular, we compared a method based on syntactic contexts with two strategies relying on windows of tagged words, one using word order and the other bags of words. On a Portuguese corpus of 12 million words, syntactic contexts produce significantly better results for both frequent and not very frequent words.

This work has been supported by Ministerio de Educació y Ciencia of Spain, within the project ExtraLex, ref: PGIDIT07PXIB204015PR.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.: Part-of-speech tagging and partial parsing. In: Church, K., Young, S., Bloothooft, G. (eds.) Corpus-Based Methods in Language and Speech. Kluwer Academic Publishers, Dordrecht (1996)

    Google Scholar 

  2. Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: LREC 2004, Lisbon, Portugal (2004)

    Google Scholar 

  3. Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: ACL Workshop on Unsupervised Lexical Acquisition, Philadelphia, pp. 59–66 (2002)

    Google Scholar 

  4. Gamallo, P.: Learning bilingual lexicons from comparable english and spanish corpora. In: Machine Translation SUMMIT XI, Copenhagen, Denmark (2007)

    Google Scholar 

  5. Gamallo, P., Agustini, A., Lopes, G.: Clustering syntactic positions with similar semantic requirements. Computational Linguistics 31(1), 107–146 (2005)

    Article  Google Scholar 

  6. Grefenstette, G.: Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In: Workshop on Acquisition of Lexical Knowledge from Text SIGLEX/ACL, Columbus, OH (1993)

    Google Scholar 

  7. Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Montreal (1998)

    Google Scholar 

  8. Lin, D.: Dependency-based evaluation of minipar. In: Workshop on Evaluation of Parsing Systems, Granada, Spain (1998)

    Google Scholar 

  9. Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Computational Linguistics 33(2), 161–199 (2007)

    Article  Google Scholar 

  10. Peirsman, Y., Heylen, K., Speelman, D.: Finding semantically related words in dutch. co-occurrences versus syntactic contexts. In: CoSMO Workshop, Roskilde, Denmark, pp. 9–16 (2007)

    Google Scholar 

  11. Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)

    Google Scholar 

  12. Seretan, V., Wehrli, E.: Accurate collocation extraction using a multilingual parser. In: COLING-ACL 2006, pp. 953–960 (2006)

    Google Scholar 

  13. van der Plas, L., Bouma, G.: Syntactic contexts for finding semantically related words. In: CLIN 2004 (2004)

    Google Scholar 

  14. Wehrli, E.: Fips, a deep linguistic multilingual parser. In: 5th Workshop on Important Unresolved Matters, pp. 120–127 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gamallo Otero, P. (2008). Comparing Window and Syntax Based Strategies for Semantic Extraction. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85980-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85979-6

  • Online ISBN: 978-3-540-85980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics