Abstract
The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alam, F.: Named entity recognition on transcription using cascaded classifiers. In: Working Notes of EVALITA 2011 (2012)
Bartalesi Lenzi, V., Speranza, M., Sprugnoli, R.: Named entity recognition on transcribed broadcast news-guidelines for participants (2011)
Black, W.J., Rinaldi, F., Mowatt, D.: Facile: description of the NE system used for MUC-7. In: Proceedings of the 7th Message Understanding Conference (1998)
Chowdhury, M.F.M.: A simple yet effective approach for named entity recognition from transcribed broadcast news. In: Evaluation of Natural Language and Speech Tools for Italian, pp. 98–106. Springer, Berlin (2013)
Coates-Stephens, S.: The analysis and acquisition of proper names for the understanding of free text. Comput. Humanit. 26(5–6), 441–456 (1992)
Cohen, W.W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proceedings of the 10th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pp. 89–98. ACM (2004)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Falavigna, D., Giuliani, D., Gretter, R., Lööf, J., Gollan, C., Schlüter, R., Ney, H.: Automatic transcription of courtroom recordings in the JUMAS project. In: 2nd International Conference on ICT Solutions for Justice, pp. 65–72. Skopje, Macedonia (2009)
Favre, B., Béchet, F., Nocéra, P.: Robust named entity extraction from large spoken archives. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 491–498 (2005)
Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: IJCNLP, pp. 518–526 (2011)
Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of french radio broadcasts. In: Interspeech, vol. 9. pp. 2583–2586 (2009)
Gravier, G., Adda, G.: Evaluation plan ETAPE 2011. http://www.afcp-parole.org/etape-en.html (2011)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996)
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pp. 287–292. Citeseer (1998)
Lenzi, V.B., Speranza, M., Sprugnoli, R.: Named entity recognition on transcribed broadcast news at Evalita 2011. In: Evaluation of Natural Language and Speech Tools for Italian, pp. 86–97. Springer (2013)
Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R.: I-CAB: the Italian content annotation bank. In: Proceedings of LREC, pp. 963–968 (2006)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvist. Investig. 30(1), 3–26 (2007)
Nguyen, T.V.T., Moschitti, A.: Structural reranking models for named entity recognition. Intell. Artif. 6(2), 177–190 (2012)
Palmer, D.D., Burger, J.D., Ostendorf, M.: Information extraction from broadcast news speech data. In: Proceedings of the DARPA Broadcast News Workshop, pp. 41–46. Citeseer (1999)
Pianta, E., Girardi, C., Zanoli, R.: The textpro tool suite. In: LREC (2008)
Przybocki, M.A., Fiscus, J.G., Garofolo, J.S., Pallett, D.S.: 1998 hub-4 information extraction evaluation. In: Proceedings of DARPA Broadcast News Workshop, (Herndon, Va, USA), pp. 13–18 (1999)
Raghavan, H., Allan, J.: Using soundex codes for indexing names in ASR documents. In: Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004, Association for Computational Linguistics, pp. 22–27 (2004)
Robinson, P., Brown, E., Burger, J., Chinchor, N., Douthat, A., Ferro, L., Hirschman, L.: Overview: information extraction from broadcast news. In: Proceedings of DARPA Broadcast News Workshop, pp. 27–30 (1999)
Sandrini, V., Federico, M.: Spoken information extraction from Italian broadcast news. Springer (2003)
Speranza, M.: Evalita 2007: the named entity recognition task. In: Proceedings of EVALITA (2007)
Speranza, M.: The named entity recognition task at Evalita 2009. In: Proceedings of the Workshop Evalita (2009)
Srihari, R.K., Niu, C., Li, W., Ding, J.: A case restoration approach to named entity tagging in degraded documents. In: 2013 12th International Conference on Document Analysis and Recognition, vol. 2, pp. 720–720. IEEE Computer Society (2003)
Turmo, J., Comas, P., Rosset, S., Galibert, O., Moreau, N., Mostefa, D., Rosso, P., Buscaldi, D.: Overview of QAST 2009-question answering on speech transcriptions. In: CLEF 2009 Workshop (2009)
Zanoli, R., Pianta, E.: Intelligenza artificiale—numero speciale su strumenti per l’elaborazione del linguaggio naturale per l’italiano. In: Associazione Italiana per l’Intelligenza Artificiale, vol. 4, pp. 69–70 (2007)
Zanoli, R., Pianta, E., Giuliano, C.: Named entity recognition through redundancy driven classifiers. In: Proceedings of Evalita 9 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Alam, F., Magnini, B., Zanoli, R. (2015). Comparing Named Entity Recognition on Transcriptions and Written Texts. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project. Studies in Computational Intelligence, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-319-14206-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-14206-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14205-0
Online ISBN: 978-3-319-14206-7
eBook Packages: EngineeringEngineering (R0)