Advertisement

Spoken Information Extraction from Italian Broadcast News

  • Vanessa Sandrini
  • Marcello Federico
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2633)

Abstract

Current research on information extraction from spoken documents is mainly focused on the recognition of named entities, such as names of organizations, locations and persons, within transcripts automatically generated by a speech recognizer. In this work we present research carried out at ITC-irst on named entity recognition in Italian broadcast news. In particular, an original statistical named entity tagger is described which can be trained with relatively little language resources: a seed list of named entities and a large untagged text corpus. Moreover, the paper discusses and presents named entity recognition experiments with case sensitive automatic transcripts, generated by the ITC-irst speech recognizer, and by training the named entity model with seed lists of different size.

Keywords

Name Entity Recognition Entity Recognition Word Error Rate Broadcast News Speech Recognizer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    N. Bertoldi, F. Brugnara, M. Cettolo, M. Federico, and D. Giuliani. From broadcast news to spontaneous dialogue transcription: portability issues. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, UT, 2001.Google Scholar
  2. [2]
    F. Brugnara and M. Federico. Dynamic language models for interactive speech applications. In Proceedings of the 5th European Conference on Speech Communication and Technology, pages 2751–2754, Rhodes, Greece, 1997.Google Scholar
  3. [3]
    N. Chinchor, E. Brown, L. Ferro, and P. Robinson. 1999 Named Entity Recognition Task definition. Technical Report Version 1.4, MITRE, Corp., August 1999. http://www.nist.gov/speech/tests/ie-er/er_99/doc/ne99_taskdef_v1_4.ps.
  4. [4]
    A. Cucchiarelli, D. Luzi, and P. Velandri. Automatic semantic tagging of unknown proper names. In In Proceedings of COLING-ACL 1998, Montreal, Canada, 1998.Google Scholar
  5. [5]
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39:1–38, 1977.zbMATHMathSciNetGoogle Scholar
  6. [6]
    M. Federico, N. Bertoldi, and V. Sandrini. Bootstrapping named entity recognition for Italian broadcast news. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, July 2002.Google Scholar
  7. [7]
    Y. Gotoh and S. Renals. Information extraction from broadcast news. Journal of the Royal Statistical Society, A, pages 1295–1310, 2000.Google Scholar
  8. [8]
    X. Huang, A. Acero, H.-W. Hon, and R. Reddy. Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, 2001.Google Scholar
  9. [9]
    K. Humphreys, R. Gaizauskas, S. Azzam, C. Huyck, B. Mitchell, H. Cunningham, and Y. Wilks. University of Sheffield: description of the LASIE-II system as used for MUC-7. In In Meggase Understanding Conference Proceedings: MUC-7, 1998.Google Scholar
  10. [10]
    G. Krupke and K. Hausman. Isoquest Inc: description of the NetOwl(TM) extractor system as used for MUC-7. In In Meggase Understanding Conference Proceedings: MUC-7, 1998.Google Scholar
  11. [11]
    A. Mikheev, M. Moens, and C. Grover. Named entity recognition without gazetteers. In In Proceedings. of 9th Conference of the European Chapter of the Association for Computatinal Linguistics, Bergen, Norway, June 1999.Google Scholar
  12. [12]
    D. Miller, R. Schwartz, R. Weischedel, and R. Stone. Named entity extraction from broadcast news. In Proceedings of the DARPA Broadcast News Workshop, Herndon, VA, February 1999.Google Scholar
  13. [13]
    M. A. Przybocki, J. G. Fiscus, J. S. Garafolo, and D. S. Pallett. 1998 Hub-4 information extraction evaluation. In Proceedings of the DARPA Broadcast News Workshop, Herndon, VA, February 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Vanessa Sandrini
    • 1
  • Marcello Federico
    • 1
  1. 1.ITC-irst — Centro per la Ricerca Scientifica e TecnologicaPovo, TrentoItaly

Personalised recommendations