An IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News

  • Niraj Shrestha
  • Ivan Vulić
  • Marie-Francine Moens
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8201)

Abstract

We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber. We show that our method is able to find named entities missing in the transcribed speech data, and additionally to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art NER results from speech data both in terms of recall and precision.

Keywords

Named entity recognition term expansion broadcast news speech data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ffmpeg audio/video tool @ONLINE (2012), http://www.ffmpeg.org
  2. 2.
    Basili, R., Cammisa, M., Donati, E.: RitroveRAI: A Web application for semantic indexing and hyperlinking of multimedia news. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 97–111. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Béchet, F., Gorin, A.L., Wright, J.H., Tur, D.H.: Detecting and extracting named entities from spontaneous speech in a mixed-initiative spoken dialogue context: How may I help you? Speech Comm. 42(2), 207–225 (2004)CrossRefGoogle Scholar
  4. 4.
    Blanco, R., De Francisci Morales, G., Silvestri, F.: Towards leveraging closed captions for news retrieval. In: Proc. of WWW Companion, pp. 135–136 (2013)Google Scholar
  5. 5.
    Bulyko, I., Ostendorf, M., Stolcke, A.: Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In: Proc. of NAACL-HLT, pp. 7–9 (2003)Google Scholar
  6. 6.
    Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proc. of SIGIR, pp. 243–250 (2008)Google Scholar
  7. 7.
    Chinchor, N.A.: MUC-7 named entity task definition (version 3.5). In: Proc. of MUC (1997)Google Scholar
  8. 8.
    Favre, B., Béchet, F., Nocera, P.: Robust named entity extraction from large spoken archives. In: Proc. of EMNLP, pp. 491–498 (2005)Google Scholar
  9. 9.
    FBK: FBK ASR transcription (2013), https://hlt-tools.fbk.eu/tosca/publish/ASR/transcribe
  10. 10.
    Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proc. of ACL, pp. 363–370 (2005)Google Scholar
  11. 11.
    Horlock, J., King, S.: Discriminative methods for improving named entity extraction on speech data. In: Proc. of EUROSPEECH, pp. 2765–2768 (2003)Google Scholar
  12. 12.
    Kim, M.H., Compton, P.: Improving the performance of a named entity recognition system with knowledge acquisition. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 97–113. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proc. of the DARPA Broadcast News Transcription and Understanding, pp. 287–292 (1998)Google Scholar
  14. 14.
    Lei, X., Wang, W., Stolcke, A.: Data-driven lexicon expansion for Mandarin broadcast news and conversation speech recognition. In: Proc. of ICASSP, pp. 4329–4332 (2009)Google Scholar
  15. 15.
    Miller, D., Schwartz, R., Weischedel, R., Stone, R.: Named entity extraction from broadcast news. In: Proc. of the DARPA Broadcast News, pp. 37–40 (1999)Google Scholar
  16. 16.
    Mishra, T., Bangalore, S.: Qme!: A speech-based question-answering system on mobile devices. In: Proc. of NAACL-HLT, pp. 55–63 (2010)Google Scholar
  17. 17.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  18. 18.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  19. 19.
    Odijk, D., Meij, E., de Rijke, M.: Feeding the second screen: semantic linking based on subtitles. In: Proc. of the 10th Conference on OAIR, OAIR 2013, pp. 9–16 (2013)Google Scholar
  20. 20.
    Palmer, D.D., Ostendorf, M., Burger, J.D.: Robust information extraction from automatically generated speech transcriptions. Speech Comm. 32(1-2), 95–109 (2000)CrossRefGoogle Scholar
  21. 21.
    Przybocki, J.M., Fiscus, J.G., Garofolo, J.S., Pallett, D.S.: HUB-4 information extraction evaluation. In: Proc. of the DARPA Broadcast News, pp. 13–18 (1999)Google Scholar
  22. 22.
    Sang, E.F.T.K., Meulder, F.D.: Introduction to the CoNLL-2003 shared task: Language-Independent named entity recognition. In: Proc. of CoNLL, pp. 142–147 (2003)Google Scholar
  23. 23.
    Stanford: Stanford NER in CoNLL 2003 (2003), http://nlp.stanford.edu/projects/project-ner.shtml
  24. 24.
    Sundheim, B.: Overview of results of the MUC-6 evaluation. In: Proc. of MUC, pp. 13–31 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Niraj Shrestha
    • 1
  • Ivan Vulić
    • 1
  • Marie-Francine Moens
    • 1
  1. 1.Department of Computer ScienceKU LeuvenBelgium

Personalised recommendations