Building a Dictionary of Anthroponyms

  • Jorge Baptista
  • Fernando Batista
  • Nuno Mamede
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


This paper presents a methodology for building an electronic dictionary of anthroponyms of European Portuguese (DicPRO), which constitutes a useful resource for computational processing, due to the importance of names in the structuring of information in texts. The dictionary has been enriched with morphosyntactic and semantic information. It was then used in the specific task of capitalizing anthroponyms and other proper names on a corpus automatically produced by a broadcast news speech recognition system and manually corrected. The output of this system does not offer clues, such as capitalized words or punctuation. This task expects to contribute in rendering more readable the output of such system. The paper shows that, by combining lexical, contextual (positional) and statistical information, instead of only one of these strategies, better results can be achieved in this task.


Training Corpus Name Entity Recognition Lexical Ambiguity Evaluation Corpus Decision List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fourour, N., Morin, E., Daille, B.: Incremental recognition and referential categorization of French proper names. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002), vol. III, pp. 1068–1074 (2002)Google Scholar
  2. 2.
    Traboulsi, H.: A Local Grammar for Proper Names. MPhil Thesis. Surrey University (2004)Google Scholar
  3. 3.
    McDonald, D.: Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In: Boguraev, B., Putejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 61–76. MIT Press, Cambridge (1993)Google Scholar
  4. 4.
    Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theoretical Computer Science 313(1), 93–104 (2004)CrossRefGoogle Scholar
  5. 5.
    Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: Proceedings of ACL 1994, pp. 88–95 (1994)Google Scholar
  6. 6.
    Yarowsky, D.: Hierarchical Decision Lists for Word Sense Disambiguation. Computers and the Humanities 34(1-2), 179–186 (2000)CrossRefGoogle Scholar
  7. 7.
    Piton, O., Maurel, D.: Les noms propres géographiques et le dictionnaire Prolintex. In: Muller, C., Royautée, J., Silberztein, M. (eds.) Intex Pour la linguistique et le traitement automatique des langues. Cahiers MSH Ledoux, vol. 1, pp. 53–76. Presses Universitaires de Franche- Comté, Besançon (2004)Google Scholar
  8. 8.
    Moura, P.: Dicionário electrónico de siglas e acrónimos. MSc Thesis, Faculdade de Letras da Universidade de Lisboa (unpublished) (2000)Google Scholar
  9. 9.
    Caseiro, D., Trancoso, I.: Using dynamic wfst composition for recognizing broadcast news. In: Proc. ICSLP 2002, Denver, Colorado, EUA (2002)Google Scholar
  10. 10.
    Gary-Prieur, M.-N. (ed.): Syntaxe et sémantique des noms propres. Langue Française 92. Paris, Larousse (data)Google Scholar
  11. 11.
    Leroy, S.: Le nom propre en français. Ophrys, Paris (2004)Google Scholar
  12. 12.
    Molino, J. (ed.): Le nom propre. Langue Française 66. Paris, Larousse (data)Google Scholar
  13. 13.
    Anderson, J.: On the Grammar of names (in Language) (May 2004) (to appear)Google Scholar
  14. 14.
    Silberztein, M.: Dictionnaires électroniques et analyse automatique de texts. Le système Intex. Masson, Paris (1993)Google Scholar
  15. 15.
    Trancoso, I.: The ONOMASTICA Inter-Language Pronunciation Lexicon. In: Proceedings of EUROSPEECH 1995 - 4th European Conference on Speech Communication and Technology - Madrid, Spain (September 1995)Google Scholar
  16. 16.
    Ranchhod, E., Mota, C., Baptista, J.: A Computational Lexicon of Portuguese for Automatic Text Parsing. SIGLEX-99: Standardizing Lexical Resources, pp. 74-80. ACL/Maryland Univ., Maryland (1999)Google Scholar
  17. 17.
    Baptista, J.: A Local Grammar of Proper Nouns. Seminários de Linguística 2: pp. 21-37. Universidade do Algarve, Faro (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jorge Baptista
    • 1
    • 2
  • Fernando Batista
    • 1
    • 3
  • Nuno Mamede
    • 1
    • 4
  1. 1.L2F – Laboratório de Sistemas de Língua Falada – INESC ID LisboaLisboaPortugal
  2. 2.Faculdade de Ciências Humanas e SociaisUniversidade do AlgarveFaroPortugal
  3. 3.ISCTE – Instituto de Ciências do Trabalho e da EmpresaLisboaPortugal
  4. 4.Instituto Superior TécnicoUniversidade técnica de LisboaLisboaPortugal

Personalised recommendations