Abstract
A task that has been widely studied in the field of natural language processing is the Named Entity Recognition (NER). A great number of approaches have been developed to deal with the identification and classification of named entity strings in specific- and open-domains. Nevertheless, external modules have to be incorporated into many of the NER systems in order to solve the interpretation problems derived from proper nouns. In this article our focus will be on the study of ambiguity in Hispanic Nominal Sequences which constitution assumes three main problems: (1) the association of given names and/or surnames; (2) the composition of such elements by means of a connector; (3) and the duality of given name/surname. In order to analyze the magnitude of the problem, two gazetteers were made, one with 93998 given names and the other with 13779 surnames. The gazetteers entries were used as terminal symbols of the proposed grammar to determine the valid interpretations in the nominal sequences; this is done by means of an automatic labeling of all the elements the nominal sequences are made of.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dale, R., Mazur, P.: Handling conjunctions in named entities. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 131–142. Springer, Heidelberg (2007)
Babych, B., Hartley, A.: Improving machine translation quality with automatic named entity recognition. In: Proceedings of the 7th International EAMT 2003, Budapest, Hungary, pp. 1–8 (2003)
Huang, F.: Multilingual Named Entity Extraction and Translation from Text and Speech. PhD thesis, Carnegie Mellon University (2005)
Grover, C., Gearailt, D., Karkaletsis, V., Farmakiotou, D., Pazienza, M., Vindigni, M.: Multilingual xml-based named entity recognition. In: Proceedings of the International Conference on Language Resources and Evaluation LREC 2002, pp. 1060–1067 (2002)
Yun, B.-H.: HMM-based korean named entity recognition for information extraction. In: Zhang, Z., Siekmann, J.H. (eds.) KSEM 2007. LNCS, vol. 4798, pp. 526–531. Springer, Heidelberg (2007)
Ramesh, G.: From named entity recognition and disambiguation to relation extraction – learning for information extraction. In: Proceedings of National Conference on Research Prospects in Knowledge Mining NCKM 2008, Tamil Nadu, India (2008)
Paik, W., Liddy, E.D., Yu, E., McKenna, M.: Categorizing and standardizing proper nouns for efficient information retrieval. In: Proceedings of the Workshop on Acquisition of Lexical Knowledge from Text, Ohio, USA, pp. 154–160 (1993)
Thompson, P., Dozier, C.: Name searching and information retrieval. In: Proceedings of 2nd Conference on Empirical Methods in Natural Language Processing EMNLP 1997, Rhode Island, USA, pp. 134–140 (1997)
Mollá, D., van Zaanen, M., Smith, D.: Named entity recognition for question answering. In: Proceedings of Australasian Language Technology Workshop ALTW 2006, Sydney, Australia, pp. 51–58 (2006)
Noguera, E., Toral, A., Llopis, F., Muńoz, R.: Reducing question answering input data using named entity recognition. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS, vol. 3658, pp. 428–434. Springer, Heidelberg (2005)
Nadeau, D., Turney, P., Matwin, S.: Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In: Proceedings of the 19th Canadian Conference on Artificial Intelligence CAI 2006, Quebec City, Canada, pp. 266–277 (2006)
Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the 16th National Conference on Artificial Intelligence AAAI 1999, Florida, USA, pp. 474–479 (1999)
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th International Conference on Computational Linguistics COLING 1992, Nantes, France, pp. 539–545 (1992)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderl, S., Weld, D., Yates, E.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)
Lin, D., Pantel, P.: Induction of semantic classes from natural language text. In: Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining SIGKDD 2001, California, USA, pp. 317–322 (2001)
Kozareva, Z., Ferreira, J., Gamallo, P., Pereira, G.: Cluster analysis of named entities. In: Proceedings of the International Conference on Intelligent Information Processing and Web Mining IIS: IIPWM 2004, Zakopane, Poland, pp. 429–433 (2004)
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA, pp. 287–292 (1998)
Cohen, W.: Exploiting dictionaries in named entity extraction: Combining semi-markov extraction processes and data integration methods. In: Proceedings of the 10th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining KDD 2004, Washington, USA, pp. 89–98 (2004)
Sigletos, G., Paliouras, G., Spyropoulos, C.D., Hatzopoulos, M.: Mining web sites using wrapper induction, named entities, and post-processing. In: Berendt, B., Hotho, A., Mladenič, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds.) EWMF 2003. LNCS, vol. 3209, pp. 97–112. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barceló, G., Cendejas, E., Sidorov, G., Bolshakov, I.A. (2009). Formal Grammar for Hispanic Named Entities Analysis. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-00382-0_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00381-3
Online ISBN: 978-3-642-00382-0
eBook Packages: Computer ScienceComputer Science (R0)