SIEMÊS – A Named-Entity Recognizer for Portuguese Relying on Similarity Rules

  • Luís Sarmento
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


In this paper we describe SIEMÊS, a named-entity recognition system for Portuguese that relies on a set of similarity rules to base the classification procedure. These rules try to obtain soft matches between candidate entities found in text and instances contained in a wide-scope gazetteer, and avoid the need for coding large sets of rules by exploiting lexical similarities. Using this matching procedure, SIEMÊS generates a set of classification hypotheses based solely on internal evidence, which may be disambiguated in a later step by relatively simple rules based on contextual clues. We explain SIEMÊS architecture and its named-entity identification and classification procedure. We also briefly discuss the results of the participation of SIEMÊS in HAREM, the named-entity evaluation contest for Portuguese, and describe future work.


Machine Translation Semantic Role Similarity Rule Name Entity Recognition Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mihalcea, R., Moldovan, D.: Document Indexing Using Named Entities. Studies in Informatics and Control 10(1) (January 2001)Google Scholar
  2. 2.
    Babych, B., Hartley, A.: Improving Machine Translation quality with automatic Named Entity recognition. In: EACL 2003, 10th Conference of the European Chapter. Proc. of the 7th Int. EAMT workshop on MT and other language technology tools, Budapest Hungary, pp. 1–8 (2003)Google Scholar
  3. 3.
    Erik, F., Tjong, K., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language- Independent Named Entity Recognition. In: Proc. of CoNLL 2003, Edmonton, Canada, pp. 142–147 (2003)Google Scholar
  4. 4.
    Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: An Advanced NER Evaluation Contest for Portuguese. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, Italy (2006)Google Scholar
  5. 5.
    Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proc. Int. Conf. on Computational Linguistics, Copenhagen, pp. 466–471 (1996)Google Scholar
  6. 6.
    Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without Gazetteers. In: Proc. of EACL 1919, ACL, Bergen, June 8-12, pp. 1–8 (1999)Google Scholar
  7. 7.
    Sarmento, L., Pinto, A., Cabral, L.: REPENTINO – a Wide-Scope Gazetteer for Entity Recognition in Portuguese. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 31–40. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Sarmento, L.: A expansão de conjuntos de co-hipónimos a partir de colecções de grandes dimensões de texto em Português. In: Actas da 1a Conf. de Metodologias de Investigação Científica, Porto, Portugal (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Luís Sarmento
    • 1
  1. 1.Faculdade de Engenharia Universidade Porto (NIAD&R) & Linguateca (Porto Node)Portugal

Personalised recommendations