Comparing Window and Syntax Based Strategies for Semantic Extraction

Gamallo Otero, Pablo

doi:10.1007/978-3-540-85980-2_5

Comparing Window and Syntax Based Strategies for Semantic Extraction

Pablo Gamallo Otero¹

Conference paper

569 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5190))

Abstract

In this paper, we describe and compare two different approaches for extracting similar words from large corpora. In particular, we compared a method based on syntactic contexts with two strategies relying on windows of tagged words, one using word order and the other bags of words. On a Portuguese corpus of 12 million words, syntactic contexts produce significantly better results for both frequent and not very frequent words.

This work has been supported by Ministerio de Educació y Ciencia of Spain, within the project ExtraLex, ref: PGIDIT07PXIB204015PR.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S.: Part-of-speech tagging and partial parsing. In: Church, K., Young, S., Bloothooft, G. (eds.) Corpus-Based Methods in Language and Speech. Kluwer Academic Publishers, Dordrecht (1996)
Google Scholar
Carreras, X., Chao, I., Padró, L., Padró, M.: An open-source suite of language analyzers. In: LREC 2004, Lisbon, Portugal (2004)
Google Scholar
Curran, J.R., Moens, M.: Improvements in automatic thesaurus extraction. In: ACL Workshop on Unsupervised Lexical Acquisition, Philadelphia, pp. 59–66 (2002)
Google Scholar
Gamallo, P.: Learning bilingual lexicons from comparable english and spanish corpora. In: Machine Translation SUMMIT XI, Copenhagen, Denmark (2007)
Google Scholar
Gamallo, P., Agustini, A., Lopes, G.: Clustering syntactic positions with similar semantic requirements. Computational Linguistics 31(1), 107–146 (2005)
Article Google Scholar
Grefenstette, G.: Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In: Workshop on Acquisition of Lexical Knowledge from Text SIGLEX/ACL, Columbus, OH (1993)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Montreal (1998)
Google Scholar
Lin, D.: Dependency-based evaluation of minipar. In: Workshop on Evaluation of Parsing Systems, Granada, Spain (1998)
Google Scholar
Padó, S., Lapata, M.: Dependency-based construction of semantic space models. Computational Linguistics 33(2), 161–199 (2007)
Article Google Scholar
Peirsman, Y., Heylen, K., Speelman, D.: Finding semantically related words in dutch. co-occurrences versus syntactic contexts. In: CoSMO Workshop, Roskilde, Denmark, pp. 9–16 (2007)
Google Scholar
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: ACL 1999, pp. 519–526 (1999)
Google Scholar
Seretan, V., Wehrli, E.: Accurate collocation extraction using a multilingual parser. In: COLING-ACL 2006, pp. 953–960 (2006)
Google Scholar
van der Plas, L., Bouma, G.: Syntactic contexts for finding semantically related words. In: CLIN 2004 (2004)
Google Scholar
Wehrli, E.: Fips, a deep linguistic multilingual parser. In: 5th Workshop on Important Unresolved Matters, pp. 120–127 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Língua Espanhola, Faculdade de Filologia, Universidade de Santiago de Compostela, Galiza, Spain
Pablo Gamallo Otero

Authors

Pablo Gamallo Otero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gamallo Otero, P. (2008). Comparing Window and Syntax Based Strategies for Semantic Extraction. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-85980-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85979-6
Online ISBN: 978-3-540-85980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics