Abstract
This paper presents a collection of texts manually annotated with named entities in context, which was used for HAREM, the first evaluation contest for named entity recognizers for Portuguese. We discuss the options taken and the originality of our approach compared with previous evaluation initiatives in the area. We document the choice of categories, their quantitative weight in the overall collection and how we deal with vagueness and underspecification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: an Advanced NER Evaluation Contest for Portuguese. In: Proceedings of LREC 2006, Genoa, Italy (2006)
Hirschman, L.: The evolution of Evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language 12(4), 281–305 (1998)
Santos, D.: Avaliação conjunta. In: Santos, D. (ed.) Avaliação conjunta: um novo paradigma no processamento computacional da lÃngua portuguesa (In press)
Santos, D., Barreiro, A.: On the problems of creating a consensual golden standard of inflected forms in Portuguese. In: Lino, et al. (eds.) Proceedings of LREC 2004, pp. 483–486 (2004)
Santos, D., Costa, L., Rocha, P.: Cooperatively evaluating Portuguese morphology. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V., et al. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 259–266. Springer, Heidelberg (2003)
Grisham, R., Sundheim, B.: Message Understaning Conference - 6: A Brief History. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 466–471 (1996)
Mota, C., Santos, D., Ranchhod, E.: Avaliação de reconhecimento de entidades mencionadas: princÃpio de AREM. In: Santos, D. (ed.) Avaliação conjunta: um novo paradigma no processamento computacional da lÃngua portuguesa (In press)
Rocha, P., Santos, D.: CLEF: Abrindo a porta à participação internacional em avaliação de RI do português. In: Santos, D., ed.: Avaliação conjunta: um novo paradigma no processamento computacional da lÃngua portuguesa (In press)
Sang, E.F.T.K.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2002, Taipei, pp. 155–158 (2002)
Sang, E.F.T.K., Meulder, F.D.: Introduction to the CoNLL-2003 Shared Task: Language- Independent Named Entity Recognition. In: Proc. of CoNLL-2003, Edmonton, pp. 142–147 (2003)
Ferro, L., et al.: TIDES 2003 Standard for the Annotation of Temporal Expressions. Technical report, MITRE (2004)
Doddington, G., et al.: The Automatic Content Extraction (ACE) Program. Tasks, Data and Evaluation. In: Lino, et al. (eds.) Proc. LREC 2004, Lisbon, pp. 837–840 (2004)
Guthrie, L., Basili, R., Hajicova, E., Jelinek, F.: Beyond Entity Recognition – Semantic Labelling for NLP Tasks. In: Workshop proceedings, ELRA, Lisboa (2004)
Sekine, S., Sudo, K., Nobata, C.: Extended Named Entity Hierarchy. In: González RodrÃguez, M., Araujo, C.P.S. (eds.) Proceedings LREC 2002, Las Palmas, pp. 1818–1824 (2002)
Bering, C., et al.: Corpora and evaluation tools for multilingual named entity grammar development. In: Newman, S., Schirra, S.H. (eds.) Proceedings of Multilingual Corpora Workshop at Corpus Linguistics 2003, Lancaster, pp. 43–52 (2003)
Merchant, R., Okurowski, M.E., Chinchor, N.: The Multilingual Entity Task (met) overview. In: Proceedings of TIPSTER Text Program (Phase II), Tysons Corner, Virginia (1996)
Callmeier et al: COLLATE-Annotationsschema. Technical report, DFKI (2003), http://www.coli.uni-sb.de/~erbach/pub/collate/AnnotationScheme.pdf
Arévalo, M., Carreras, X., Márquez, L., MartÃ, M.A., Padró, L., Simón, M.J.: A Proposal for Wide-Coverage Spanish Named Entity Recognition. Revista da SEPLN 1(3), 1–15 (2002)
Kokkinakis, D.: Reducing the effect of name explosion. In: Guthrie, L., Basili, R., Hajicova, E., Jelinek, F. (eds.) Beyond Named Entity Recognition - Semantic Labelling for NLP Tasks. Pre-conference Workshop at LREC 2004, Lisboa, Portugal, pp. 1–6 (2004)
Karlgren, J., Cutting, D.: Recognizing Text Genres with Simple Metrics Using Discriminant Analysis. In: Proceedings of COLING 1994, Kyoto, Japan, pp. 1071–1075 (1994)
Santos, D.: Towards language-specific applications. Machine Translation 14(2), 83–112 (1999)
Palmer, D.D., Day, D.S.: A Statistical Profile of the Named Entity Task. In: Proceedings of ANLP 1997, Washington D.C, pp. 190–193 (1997)
Bick, E.: Multi-level NER for Portuguese in a CG framework. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 118–125. Springer, Heidelberg (2003)
Mikheev, A., Moens, M., Grover, C.: Named Entity recognition without Gazetteers. In: Proceedings of EACL 1999, Bergen, pp. 1–8 (1999)
Santos, D.: The importance of vagueness in translation: Examples from English to Portuguese. Romansk Forum 5, 43–69 (1997)
Calzolari, N., Corazzari, O.: Senseval/Romanseva: The Framework for Italian. Computers and the Humanities 34(1-2), 61–78 (2000)
Macklovitch, E.: Where the Tagger Falters. In: Proc. of the 4th International Coference on Theoretical amd Methodological Issues in Machine Translation, Montréal, pp. 113–126 (1992)
Voorhees, E.M., Tice, D.M.: Building a Question Answering Test Collection. In: Belkin, N., et al. (eds.) Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, pp. 200–207 (2000)
Cardoso, N.: Avaliação de Sistemas de Reconhecimento de Entidades Mencionadas. Master’s thesis, FEUP, Porto, Portugal (2006) (In preparation)
Seco, N., Santos, D., Cardoso, N., Vilela, R.: A complex evaluation architecture for HAREM. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 260–263. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Santos, D., Cardoso, N. (2006). A Golden Resource for Named Entity Recognition in Portuguese. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_8
Download citation
DOI: https://doi.org/10.1007/11751984_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34045-4
Online ISBN: 978-3-540-34046-1
eBook Packages: Computer ScienceComputer Science (R0)