Abstract
Automatic speech transcriptions pose serious challenges for NLP systems due to various peculiarities in the data. In this paper, we propose a simple approach for NER on speech transcriptions which achieves good results despite the peculiarities. The novelty of our approach is that it emphasizes on the maximum exploitation of the tokens, as they are, in the data. We developed a system for participating in the “NER on Transcribed Broadcast News” (closed) task of the EVALITA 2011 evaluation campaign where it was one of the best systems obtaining an F1-score of 57.02 on the automatic speech transcription test data. On the manual transcriptions of the same test data (although having no sentence boundary and punctuation symbol), the system achieves an F1-score of 73.54 which is quite high considering the fact that the system is language independent and uses no external dictionaries, gazetteers or ontologies.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: International Conference on New Methods in Language Processing, Manchester, UK, pp. 44–49 (1994)
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named Entity Extraction From Speech. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA, pp. 287–292 (1998)
Appelt, D.E., Martin, D.: Named Entity Recognition in Speech: Approach and Results Using the TextPro System. In: Proceedings DARPA Broadcast News Workshop, Virginia, USA, pp. 51–54 (1999)
McCallum, A.K.: Mallet: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu
Horlock, J., King, S.: Named Entity Extraction from Word Lattices. In: Eurospeech (2003)
Huang, F.: Multilingual Named Entity Extraction and Translation from Text and Speech., Ph.D. thesis, Carnegie Mellon University (2005)
Sudoh, K., Tsukada, H., Isozaki, H.: Incorporating Speech Recognition Confidence into Discriminative Named Entity Recognition of Speech Data. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and the 21st International Conference on Computational Linguistics (ACL-COLING ), Sydney, Australia, pp. 617–624 (2006)
Batista, F., Caseiro, D., Mamede, N.J., Trancoso, I.: Recovering Capitalization and Punctuation Marks for Automatic Speech Recognition: Case Study for Portuguese Broadcast News. In: Speech Communication, vol. 50(10), pp. 847–862. Elsevier (2008)
Galliano, S., Gravier, G., Chaubard, L.: The ESTER 2 Evaluation Campaign for the Rich Transcription of French Radio Broadcasts. In: Proceedings of the 10th Annual International Speech Communication Association Conference (Interspeech), Brighton, UK, pp. 2583–2586 (2009)
Gravano, A., Jansche, M., Bacchiani, M.: Restoring Punctuation and Capitalization in Transcribed Speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taiwan, pp. 4741–4744 (2009)
Chowdhury, M.F.M., Negri, M.: Expected Answer Type Identification from Unprocessed Noisy Questions. In: Andreasen, T., Yager, R., Bulskov, H., Christiansen, H., Larsen, H. (eds.) FQAS 2009. LNCS, vol. 5822, pp. 263–274. Springer, Heidelberg (2009)
Magnini, B., Pianta, E., Speranza, M., Lenzi, V.B., Sprugnoli, R.: Italian Content Annotation Bank (i-cab): Named entities. Technical report, FBK (2011)
Parada, C., Dredze, M., Jelinek, F.: OOV Sensitive Named Entity Recognition in Speech. In: Proceedings of the 12th Annual International Speech Communication Association Conference (Interspeech), Florence, Italy, pp. 2085–2088 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chowdhury, M.F.M. (2013). A Simple Yet Effective Approach for Named Entity Recognition from Transcribed Broadcast News. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds) Evaluation of Natural Language and Speech Tools for Italian. EVALITA 2012. Lecture Notes in Computer Science(), vol 7689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35828-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-35828-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35827-2
Online ISBN: 978-3-642-35828-9
eBook Packages: Computer ScienceComputer Science (R0)