Advertisement

Multi-level NER for Portuguese in a CG Framework

  • Eckhard Bick
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2721)

Abstract

This paper describes and evaluates a linguistically based NER system for Portuguese, based on lexico-semantical information, pattern matching and morphosyntactic, context driven Constraint Grammar rules. Preliminary F-scores for cross-domain news texts, when distinguishing six different name types, were 91.85 (raw) and 93.6 (subtyping of ready-chunked proper nouns).

Keywords

Name Entity Recognition Noun Head Entity Recognition Proper Noun Syntactic Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bick, Eckhard: The Parsing System ‘Palavras’ — Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Århus (2000)Google Scholar
  2. 2.
    Bick, Eckhard: “Named Entity Recognition for Danish”. I: Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000–2004. Forthcoming (2003).Google Scholar
  3. 3.
    Bikel, Daniel M. & Miller, Scott & Schwartz, Richard & Weischedel, Ralph: Nymble: a High-Performance Learning Name-finder. In: Proc. of the Conf. on Applied Natural Language Processing 1997Google Scholar
  4. 4.
    Borthwick, Andrew & Sterling, John & Agichtein, Eugene & Grishman, Ralph: NYU: Description of the MENE Named Entity System as Used in MUC-7. In: Proc. of the 7th Message Understanding Conf. (MUC7), April 29th–May 1st, Fairfax (1998)Google Scholar
  5. 5.
    Iason, Demiros et. al.: Named Entity Recognition in Greek Texts. In: Proceedings of the 2nd Int. Conference on Language Resources & Evaluation (LREC), 2000Google Scholar
  6. 6.
    Marsh, E. & Perzanowski, D.: MUC-7 evaluation of I.E. Technology: Overview of Results. In: Proc. of the 7th Message Understanding Conf. (MUC7), April 29th–May 1st, Fairfax (1998)Google Scholar
  7. 7.
    Mikheev, Andrei & Grover, Claire & Moens, Marc: Description of the LTG System used for MUC-7. In: Proceedings of the 7th Message Understanding Conference (MUC7), April 29th–May 1st, Fairfax (1998)Google Scholar
  8. 8.
    Palmer, David D. & Day, David S.: A Statistical Profile of the Named Entity Task. In: Proceedings of the Fifth Conference on Applied Natural Language Processing March 31st–April 3rd 1997Google Scholar
  9. 9.
    Rocha, Paulo A. & Santos, Diana: CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa. In: Maria das Graças Volpe Nunes (ed.): Actas do V. PROPOR, Nov. 19th–22nd, Atibaia (2000), pp. 131–140Google Scholar
  10. 10.
    Santos, Diana & Bick, Eckhard: Providing Internet access to Portuguese corpora: the AC/DC project. In Gavriladou et al. (eds.): Proc. 2nd International Conf. on Language Resources and Evaluation, LREC2000 (Athens, 2000), pp. 205–210.Google Scholar
  11. 11.
    Stevenson, Mark & Gaizauskas, Robert: Using Corpus-derived Name Lists for Named Entity Recognition. In: Proc. of the Sixth Conf. on Applied Natural Language Processing, Seattle, 2000Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Eckhard Bick
    • 1
  1. 1.Institute of Language and CommunicationSouthern Denmark UniversityDenmark

Personalised recommendations