Tree-Structured Named Entities Extraction from Competing Speech Transcriptions

  • Davy WeissenbacherEmail author
  • Christian Raymond
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9103)


When real applications are working with automatic speech transcription, the first source of error does not originate from the incoherence in the analysis of the application but from the noise in the automatic transcriptions. This study presents a simple but effective method to generate a new transcription of better quality by combining utterances from competing transcriptions. We have extended a structured Named Entity (NE) recognizer submitted during the ETAPE Challenge. Working on French TV and Radio programs, our system revises the transcriptions provided by making use of the NEs it has detected. Our results suggest that combining the transcribed utterances which optimize the F-measures, rather than minimizing the WER scores, allows the generation of a better transcription for NE extraction. The results show a small but significant improvement of 0.9 % SER against the baseline system on the ROVER transcription. These are the best performances reported to date on this corpus.

Index Terms

Speech transcription Structured named entities Multi-pass decoding 



We thank Dr. Abeed Sarker and Dr. Graciela Gonzalez for their helpful comments and remarks.


  1. 1.
    Dinarelli, M., Rosset, S.: Models cascade for tree-structured named entity detection. In: Proceedings of International Joint Conference on Natural Language Processing (IJCNLP), pp. 1269–1278 (2011)Google Scholar
  2. 2.
    Favre, B., Béchet, F., Nocéra, P.: Robust named entity extraction from large spoken archives. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 491–498 (2005)Google Scholar
  3. 3.
    Fiscus, J.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover). In: Proceedings IEEE Automatic Speech Recognition and Understanding Workshop, pp. 347–352 (1997)Google Scholar
  4. 4.
    Goel, V., Byrne, W.: Minimum bayes-risk automatic speech recognition. Comput. Speech Lang. 14(2), 115–135 (2000)CrossRefGoogle Scholar
  5. 5.
    Gravier, G., Adda, G., Paulson, N., Carré, M., Giraudel, A., Galibert, O.: The ETAPE corpus for the evaluation of speech-based TV content processing in the french language. In: International Conference on Language Resources, Evaluation and Corpora (2012)Google Scholar
  6. 6.
    Gravier, G., Bonastre, J., Geoffrois, E., Galliano, S., McTait, K., Choukri, K.: Ester, une campagne d’évaluation des systèmes d’indexation automatique d’émissions radiophoniques en franais. In: Proceedings Journées d’Etude sur la Parole (JEP) (2004)Google Scholar
  7. 7.
    Hakkani-Tr, D., Béchet, F., Riccardi, G., Tur, G.: Beyond ASR 1-best: using word confusion networks in spoken language understanding. Comput. Speech Lang. 20, 495–514 (2006)CrossRefGoogle Scholar
  8. 8.
    Jurafsky, D., Martin, J.: Speech and Language Processing. Prentice Hall, Englewood Cliffs (2008)Google Scholar
  9. 9.
    Kripke, S.: Naming and necessity. In: Davidson, D., Harman, G. (eds.) Semantics of Natural Language. Harvard University Press, Cambridge (1972)Google Scholar
  10. 10.
    Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: Proceedings of DARPA Broadcast News Workshop, pp. 249–252 (1999)Google Scholar
  11. 11.
    Marin, A., Kwiatkowski, T., Ostendorf, M., Zettlemoyer, L.: Using syntactic and confusion network structure for out-of-vocabulary word detection. In: Proceedings IEEE Spoken Language Technology Workshop (SLT), pp. 159–164 (2012)Google Scholar
  12. 12.
    McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of CoNLL-2013, pp. 188–191 (2013)Google Scholar
  13. 13.
    Nowozin, S., Lampert, C.: Structured learning and prediction in computer vision. Found. Trends Comput. Graph. Vis. 6, 185–365 (2010)CrossRefGoogle Scholar
  14. 14.
    Palmer, D., Ostendorf, M.: Improving information extraction by modeling errors in speech recognizer output. In: Proceedings of the First International Conference on Human Language Technology Research (2001)Google Scholar
  15. 15.
    Punyakanok, V., Roth, D., Tau Yih, W., Zimak, D.: Learning and inference over constrained output. In: Proceedings of International Joint Conferences on Artificial Intelligence (2005)Google Scholar
  16. 16.
    Raymond, C.: Robust tree-structured named entities recognition from speech. In: Proceedings of International Conference on Acoustic Speech and Signal Processing, ICASSP 2013 (2013)Google Scholar
  17. 17.
    Raymond, C., Fayolle, J.: Reconnaissance robuste d’entités nommées sur de la parole transcrite automatiquement. In: Proceedings of Traitement Automatique des Langues Naturelles (2010)Google Scholar
  18. 18.
    Rosset, S., Grouin, C., Zweigenbaum, P.: Entités nommées structurées: guide d’annotation quaero. Technical report, LIMSI-Centre national de la recherche scientifique (2011)Google Scholar
  19. 19.
    Subramaniam, L., Roy, S., Faruquie, T., Negi, S.: A survey of types of text noise and techniques to handle noisy text. In: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data, pp. 115–122 (2009)Google Scholar
  20. 20.
    Tur, G., Deoras, A., Hakkani-Tr, D.: Semantic parsing using word confusion networks with conditional random fields. In: Proceedings of Interspeech 2013, pp. 2579–2583 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.INSA - IRISA, INRIA de RennesRennesFrance

Personalised recommendations