Abstract
Proper names outnumber all other word classes, but they are underrepresented in dictionaries and in electronic resources. We propose an automatic method of associating a stand-alone proper name repository with a wordnet. A variety of sources of lexical-semantic knowledge can be harnessed into this task. Semantic proximity between a proper name and a synset can be based on pattern-driven search and on distributional analyses in large corpora. The sources are heterogeneous and the measures of semantic relatedness vary widely. We propose a flexible method, an adaptation of the algorithm of Activation Area Attachment, which treats each type of sources slightly differently. We reach 80% precision in linking proper names with places in a wordnet even if the targets are highly polysemous.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alfonseca, E., Manandhar, S.: An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery. In: Proc. of the 1st ICGW (2002)
Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.): SIIS 2011. LNCS, vol. 7053. Springer, Heidelberg (2012)
Broda, B., Kurc, R., Piasecki, M., Ramocki, R.: Evaluation method for automated wordnet expansion. In: Bouvry, et al. [2]
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press (1998)
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proc. 14th International Conference on Computational Linguistics, pp. 539–545 (1992)
Israel, G.: Determining Sample Size. Tech. rep., University of Florida (1992)
Korpus Rzeczpospolitej, corpus of text from the online edtion of daily “Rzeczpospolita” (2008), http://www.cs.put.poznan.pl/dweiss/rzeczpospolita
Kurc, R., Piasecki, M., Szpakowicz, S.: Automatic Acquisition of Wordnet Relations by Distributionally Supported Morphological Patterns Extracted from Polish Corpora. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 133–141. Springer, Heidelberg (2010)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the Joint Conference of the International Committee on Computational Linguistics, pp. 768–774. ACL (1998)
de Loupy, C., Crestan, E., Lemaire, E.: Proper Nouns Thesaurus for Document Retrieval and Question Answering. Atelier Question-Réponse, TALN (2004)
Mann, G.S.: Fine-grained proper noun ontologies for question answering. In: Proc. of the 2002 Workshop on Building and Using Semantic Networks, SEMANET 2002, vol. 11, pp. 1–7. ACL, Stroudsburg (2002)
Marcińczuk, M., Piasecki, M.: Statistical Proper Name Recognition in Polish Economic Texts. Control and Cybernetics 40(2), 1–26 (2011)
Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proc. the 6th Global Wordnet Conference, Matsue, Japan (January 2012)
Miller, G.A., Hristea, F.: WordNet Nouns: Classes and Instances. Computational Linguistics 32(1), 1–3 (2006)
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: ACL (ed.) Proc. COLING-ACL 2006, Sydney, pp. 113–120. ACL (2006), www.aclweb.org/anthology/P/P06/P06-1015
Piasecki, M.: Polish Tagger TaKIPI: Rule Based Construction and Optimisation. Task Quarterly 11(1-2), 151–167 (2007), www.task.gda.pl/files/quart/TQ2007/01-02/tq111t-g.pdf
Piasecki, M., Broda, B., Głąbska, M., Marcińczuk, M., Szpakowicz, S.: Semi-automatic Expansion of Polish WordNet based on Activation-Area Attachment. In: Recent Advances in Intelligent Information Systems, pp. 247–260. EXIT (2009)
Piasecki, M., Indyka-Piasecka, A., Kurc, R.: Linguistically Informed Mining Lexical Semantic Relations from Wikipedia Structure. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part I. LNCS (LNAI), vol. 6591, pp. 297–306. Springer, Heidelberg (2011)
Piasecki, M., Kurc, R., Broda, B.: Heterogeneous Knowledge Sources in Graph-Based Expansion of the Polish Wordnet. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part I. LNCS, vol. 6591, pp. 307–316. Springer, Heidelberg (2011)
Piasecki, M., Ramocki, R., Maziarz, M.: Automated Generation of Derivative Relations in the Wordnet Expansion Perspective. In: Proc. 6th Global Wordnet Conference, Matsue, Japan (January 2012)
Piasecki, M., Szpakowicz, S., Broda, B.: Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 99–106. Springer, Heidelberg (2007)
Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Wrocław University of Technology Press, Wrocław (2009), www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf
Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS (2004)
Radziszewski, A., Kilgarriff, A., Lew, R.: Polish Word Sketches. In: Vetulani, Z. (ed.) Human Language Technologies as a Challenge for Computer Science and Linguistics. Proc. 5th Language and Technology Conference, Poznań, Poland, pp. 237–242 (2011)
Ruiz-Casado, M., Alfonseca, E., Castells, P.: Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 380–386. Springer, Heidelberg (2005)
Snow, R., Jurafsky, D., Ng., A.Y.: Semantic taxonomy induction from heterogenous evidence. In: COLING 2006 (2006)
Sundheim, B.M., Mardis, S., Burger, J.: Gazetteer Linkage to WordNet. In: Proc. of the III IWC (2006)
Toral, R.M.A., Monachini, M.: Named Entity WordNet. In: ELRA (ed.) Proc. of the VI LREC 2008, Marrakech, Morocco (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kurc, R., Piasecki, M., Szpakowicz, S. (2013). Automatic Construction of a Dynamic Thesaurus for Proper Names. In: Przepiórkowski, A., Piasecki, M., Jassem, K., Fuglewicz, P. (eds) Computational Linguistics. Studies in Computational Intelligence, vol 458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34399-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-34399-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34398-8
Online ISBN: 978-3-642-34399-5
eBook Packages: EngineeringEngineering (R0)