Advertisement

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

  • Giuseppe Attardi
  • Giacomo Berardi
  • Stefano Dei Rossi
  • Maria Simi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7689)

Abstract

The Tanl tagger is a flexible sequence labeller based on Conditional Markov Model that can be configured to use different classifiers and to extract features according to feature templates expressed through patterns provided in a configuration file. The Tanl Tagger was applied to the task of Named Entity Recognition (NER) on Transcribed Broadcast News of Evalita 2011. The goal of the task was to identify named entities within texts produced by an Automatic Speech Recognition (ASR) system. Since such texts do not provide capitalization, punctuation or even sentence segmentation and transcription is often noisy, this represents a challenge for state of the art NER tools. We report on the results of our experiments using the Tanl Tagger as well as another widely available tagger in both the closed and open modalities.

Keywords

Named Entity Recognition Maximum Entropy Conditional Markov Model dynamic programming sequence labeling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Attardi, G., Dei Rossi, S., Di Pietro, G., Lenci, A., Montemagni, S., Simi, M.: A Resource and Tool for Super-sense Tagging of Italian Texts. In: Proc. of 7th Language Resources and Evaluation Conference, Malta (2010)Google Scholar
  2. 2.
    Attardi, G., Dei Rossi, S., Simi, M.: The Tanl Pipeline. In: Proc. of Workshop on Web Services and Processing Pipelines in HLT, Malta (2010)Google Scholar
  3. 3.
    Attardi, G., Fuschetto, A., Tamberi, F., Simi, M., Vecchi, E.M.: Experiments in tagger combination: arbitrating, guessing, correcting, suggesting. In: Proc. of Workshop Evalita (2009) ISBN 978-88-903581-1-1Google Scholar
  4. 4.
    Baroni, M., Bernardini, S., Comastri, F., Piccioni, L., Volpi, A., Aston, G., Mazzoleni, M.: Introducing the “La Repubblica” Corpus: a Large Annotated TEI (XML)–compliant corpus of newspaper italian. In: Proc. of LREC 2004, Lisbon, ELDA, pp. 1771–1774 (2004)Google Scholar
  5. 5.
    Bartalesi Lenzi, V., Speranza, M., Sprugnoli, R.: EVALITA 2011: Description and Results of the Named Entity Recognition on Transcribed Broadcast News Task. In: Working Notes of Evalita 2011, Rome, Italy, January 24-25 (2012) ISSN 2240-5186Google Scholar
  6. 6.
    Chieu, H.L., Ng, H.T.: Named Entity Recognition with a Maximum Entropy Approach. In: Proc. of CoNLL 2003, Edmonton, Canada, pp. 160–163 (2003)Google Scholar
  7. 7.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: Proc. of the 43nd Annual Meeting of the Association for Computational Linguistics, pp. 363–370 (2005)Google Scholar
  8. 8.
    Galibert, O., et al.: Structured and Extended Named Entity Evaluationin Automatic Speech Transcriptions. In: Proc. of the 5th International Joint Conference on Natural Language Processing, pp. 518–526. AFNLP (2011)Google Scholar
  9. 9.
    Halácsy, P., Kornai, A., Oravecz, C.: HunPos– an open source trigram tagger. In: Proc. of the Demo and Poster Sessions of the 45th Annual Meeting of the ACL, Prague, Czech Republic, pp. 209–212 (2007)Google Scholar
  10. 10.
    Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R.: I-CAB, the Italian Content Annotation Bank. In: Proc. of LREC 2006, Genoa, Italy (2006)Google Scholar
  11. 11.
    McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: Proc. ICML 2000, pp. 591–598 (2001)Google Scholar
  12. 12.
    Roventini, A., Alonge, A., Calzolari, N., Magnini, B., Bertagna, F.: ItalWordNet: a Large Semantic Database for Italian. In: Proceedings LREC 2000, Athens (2000)Google Scholar
  13. 13.
    Yu, H.F., Hsieh, C.J., Chang, K.W., Lin, C.J.: Large linear classification when data can-not fit in memory. ACM Trans. on Knowledge Discovery from Data 5(23), 1–23 (2012)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Giuseppe Attardi
    • 1
  • Giacomo Berardi
    • 1
  • Stefano Dei Rossi
    • 1
  • Maria Simi
    • 1
  1. 1.Dipartimento di InformaticaUniversità di PisaPisaItaly

Personalised recommendations