Skip to main content

Automatic Construction of a Dynamic Thesaurus for Proper Names

  • Chapter
Computational Linguistics

Part of the book series: Studies in Computational Intelligence ((SCI,volume 458))

  • 1671 Accesses

Abstract

Proper names outnumber all other word classes, but they are underrepresented in dictionaries and in electronic resources. We propose an automatic method of associating a stand-alone proper name repository with a wordnet. A variety of sources of lexical-semantic knowledge can be harnessed into this task. Semantic proximity between a proper name and a synset can be based on pattern-driven search and on distributional analyses in large corpora. The sources are heterogeneous and the measures of semantic relatedness vary widely. We propose a flexible method, an adaptation of the algorithm of Activation Area Attachment, which treats each type of sources slightly differently. We reach 80% precision in linking proper names with places in a wordnet even if the targets are highly polysemous.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alfonseca, E., Manandhar, S.: An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery. In: Proc. of the 1st ICGW (2002)

    Google Scholar 

  2. Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.): SIIS 2011. LNCS, vol. 7053. Springer, Heidelberg (2012)

    MATH  Google Scholar 

  3. Broda, B., Kurc, R., Piasecki, M., Ramocki, R.: Evaluation method for automated wordnet expansion. In: Bouvry, et al. [2]

    Google Scholar 

  4. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press (1998)

    Google Scholar 

  5. Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proc. 14th International Conference on Computational Linguistics, pp. 539–545 (1992)

    Google Scholar 

  6. Israel, G.: Determining Sample Size. Tech. rep., University of Florida (1992)

    Google Scholar 

  7. Korpus Rzeczpospolitej, corpus of text from the online edtion of daily “Rzeczpospolita” (2008), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita

  8. Kurc, R., Piasecki, M., Szpakowicz, S.: Automatic Acquisition of Wordnet Relations by Distributionally Supported Morphological Patterns Extracted from Polish Corpora. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 133–141. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the Joint Conference of the International Committee on Computational Linguistics, pp. 768–774. ACL (1998)

    Google Scholar 

  10. de Loupy, C., Crestan, E., Lemaire, E.: Proper Nouns Thesaurus for Document Retrieval and Question Answering. Atelier Question-Réponse, TALN (2004)

    Google Scholar 

  11. Mann, G.S.: Fine-grained proper noun ontologies for question answering. In: Proc. of the 2002 Workshop on Building and Using Semantic Networks, SEMANET 2002, vol. 11, pp. 1–7. ACL, Stroudsburg (2002)

    Chapter  Google Scholar 

  12. Marcińczuk, M., Piasecki, M.: Statistical Proper Name Recognition in Polish Economic Texts. Control and Cybernetics 40(2), 1–26 (2011)

    Google Scholar 

  13. Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proc. the 6th Global Wordnet Conference, Matsue, Japan (January 2012)

    Google Scholar 

  14. Miller, G.A., Hristea, F.: WordNet Nouns: Classes and Instances. Computational Linguistics 32(1), 1–3 (2006)

    Article  Google Scholar 

  15. Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: ACL (ed.) Proc. COLING-ACL 2006, Sydney, pp. 113–120. ACL (2006), www.aclweb.org/anthology/P/P06/P06-1015

  16. Piasecki, M.: Polish Tagger TaKIPI: Rule Based Construction and Optimisation. Task Quarterly 11(1-2), 151–167 (2007), www.task.gda.pl/files/quart/TQ2007/01-02/tq111t-g.pdf

    Google Scholar 

  17. Piasecki, M., Broda, B., Głąbska, M., Marcińczuk, M., Szpakowicz, S.: Semi-automatic Expansion of Polish WordNet based on Activation-Area Attachment. In: Recent Advances in Intelligent Information Systems, pp. 247–260. EXIT (2009)

    Google Scholar 

  18. Piasecki, M., Indyka-Piasecka, A., Kurc, R.: Linguistically Informed Mining Lexical Semantic Relations from Wikipedia Structure. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part I. LNCS (LNAI), vol. 6591, pp. 297–306. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  19. Piasecki, M., Kurc, R., Broda, B.: Heterogeneous Knowledge Sources in Graph-Based Expansion of the Polish Wordnet. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part I. LNCS, vol. 6591, pp. 307–316. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Piasecki, M., Ramocki, R., Maziarz, M.: Automated Generation of Derivative Relations in the Wordnet Expansion Perspective. In: Proc. 6th Global Wordnet Conference, Matsue, Japan (January 2012)

    Google Scholar 

  21. Piasecki, M., Szpakowicz, S., Broda, B.: Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 99–106. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  22. Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Wrocław University of Technology Press, Wrocław (2009), www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf

    Google Scholar 

  23. Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS (2004)

    Google Scholar 

  24. Radziszewski, A., Kilgarriff, A., Lew, R.: Polish Word Sketches. In: Vetulani, Z. (ed.) Human Language Technologies as a Challenge for Computer Science and Linguistics. Proc. 5th Language and Technology Conference, Poznań, Poland, pp. 237–242 (2011)

    Google Scholar 

  25. Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  26. Snow, R., Jurafsky, D., Ng., A.Y.: Semantic taxonomy induction from heterogenous evidence. In: COLING 2006 (2006)

    Google Scholar 

  27. Sundheim, B.M., Mardis, S., Burger, J.: Gazetteer Linkage to WordNet. In: Proc. of the III IWC (2006)

    Google Scholar 

  28. Toral, R.M.A., Monachini, M.: Named Entity WordNet. In: ELRA (ed.) Proc. of the VI LREC 2008, Marrakech, Morocco (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Kurc .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kurc, R., Piasecki, M., Szpakowicz, S. (2013). Automatic Construction of a Dynamic Thesaurus for Proper Names. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (eds) Computational Linguistics. Studies in Computational Intelligence, vol 458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34399-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34399-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34398-8

  • Online ISBN: 978-3-642-34399-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics