Comparing Named Entity Recognition on Transcriptions and Written Texts

Alam, Firoj; Magnini, Bernardo; Zanoli, Roberto

doi:10.1007/978-3-319-14206-7_4

Comparing Named Entity Recognition on Transcriptions and Written Texts

Firoj Alam⁸,
Bernardo Magnini⁷ &
Roberto Zanoli⁷

Chapter
First Online: 01 January 2015

432 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 589))

Abstract

The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Alam, F.: Named entity recognition on transcription using cascaded classifiers. In: Working Notes of EVALITA 2011 (2012)
Google Scholar
Bartalesi Lenzi, V., Speranza, M., Sprugnoli, R.: Named entity recognition on transcribed broadcast news-guidelines for participants (2011)
Google Scholar
Black, W.J., Rinaldi, F., Mowatt, D.: Facile: description of the NE system used for MUC-7. In: Proceedings of the 7th Message Understanding Conference (1998)
Google Scholar
Chowdhury, M.F.M.: A simple yet effective approach for named entity recognition from transcribed broadcast news. In: Evaluation of Natural Language and Speech Tools for Italian, pp. 98–106. Springer, Berlin (2013)
Google Scholar
Coates-Stephens, S.: The analysis and acquisition of proper names for the understanding of free text. Comput. Humanit. 26(5–6), 441–456 (1992)
Article Google Scholar
Cohen, W.W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proceedings of the 10th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pp. 89–98. ACM (2004)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Falavigna, D., Giuliani, D., Gretter, R., Lööf, J., Gollan, C., Schlüter, R., Ney, H.: Automatic transcription of courtroom recordings in the JUMAS project. In: 2nd International Conference on ICT Solutions for Justice, pp. 65–72. Skopje, Macedonia (2009)
Google Scholar
Favre, B., Béchet, F., Nocéra, P.: Robust named entity extraction from large spoken archives. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 491–498 (2005)
Google Scholar
Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: IJCNLP, pp. 518–526 (2011)
Google Scholar
Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of french radio broadcasts. In: Interspeech, vol. 9. pp. 2583–2586 (2009)
Google Scholar
Gravier, G., Adda, G.: Evaluation plan ETAPE 2011. http://www.afcp-parole.org/etape-en.html (2011)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996)
Google Scholar
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pp. 287–292. Citeseer (1998)
Google Scholar
Lenzi, V.B., Speranza, M., Sprugnoli, R.: Named entity recognition on transcribed broadcast news at Evalita 2011. In: Evaluation of Natural Language and Speech Tools for Italian, pp. 86–97. Springer (2013)
Google Scholar
Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R.: I-CAB: the Italian content annotation bank. In: Proceedings of LREC, pp. 963–968 (2006)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvist. Investig. 30(1), 3–26 (2007)
Article Google Scholar
Nguyen, T.V.T., Moschitti, A.: Structural reranking models for named entity recognition. Intell. Artif. 6(2), 177–190 (2012)
Google Scholar
Palmer, D.D., Burger, J.D., Ostendorf, M.: Information extraction from broadcast news speech data. In: Proceedings of the DARPA Broadcast News Workshop, pp. 41–46. Citeseer (1999)
Google Scholar
Pianta, E., Girardi, C., Zanoli, R.: The textpro tool suite. In: LREC (2008)
Google Scholar
Przybocki, M.A., Fiscus, J.G., Garofolo, J.S., Pallett, D.S.: 1998 hub-4 information extraction evaluation. In: Proceedings of DARPA Broadcast News Workshop, (Herndon, Va, USA), pp. 13–18 (1999)
Google Scholar
Raghavan, H., Allan, J.: Using soundex codes for indexing names in ASR documents. In: Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004, Association for Computational Linguistics, pp. 22–27 (2004)
Google Scholar
Robinson, P., Brown, E., Burger, J., Chinchor, N., Douthat, A., Ferro, L., Hirschman, L.: Overview: information extraction from broadcast news. In: Proceedings of DARPA Broadcast News Workshop, pp. 27–30 (1999)
Google Scholar
Sandrini, V., Federico, M.: Spoken information extraction from Italian broadcast news. Springer (2003)
Google Scholar
Speranza, M.: Evalita 2007: the named entity recognition task. In: Proceedings of EVALITA (2007)
Google Scholar
Speranza, M.: The named entity recognition task at Evalita 2009. In: Proceedings of the Workshop Evalita (2009)
Google Scholar
Srihari, R.K., Niu, C., Li, W., Ding, J.: A case restoration approach to named entity tagging in degraded documents. In: 2013 12th International Conference on Document Analysis and Recognition, vol. 2, pp. 720–720. IEEE Computer Society (2003)
Google Scholar
Turmo, J., Comas, P., Rosset, S., Galibert, O., Moreau, N., Mostefa, D., Rosso, P., Buscaldi, D.: Overview of QAST 2009-question answering on speech transcriptions. In: CLEF 2009 Workshop (2009)
Google Scholar
Zanoli, R., Pianta, E.: Intelligenza artificiale—numero speciale su strumenti per l’elaborazione del linguaggio naturale per l’italiano. In: Associazione Italiana per l’Intelligenza Artificiale, vol. 4, pp. 69–70 (2007)
Google Scholar
Zanoli, R., Pianta, E., Giuliano, C.: Named entity recognition through redundancy driven classifiers. In: Proceedings of Evalita 9 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

FBK-irst, via Sommarive 18, 38123, Povo (TN), Italy
Bernardo Magnini & Roberto Zanoli
SIS Lab, Department of Information Engineering and Computer Science, University of Trento, 38123, Povo (TN), Italy
Firoj Alam

Authors

Firoj Alam
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo Magnini
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Zanoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Firoj Alam .

Editor information

Editors and Affiliations

Department of Computer Science, Systems and Production, University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Department of Computer Science, University of Turin, Turin, Italy
Cristina Bosco
Department of Language and Cultural Studies, Department of Computer Science, Ca’ Foscari University of Venice, Venezia, Italy
Rodolfo Delmonte
Department of Computer Science and Information Engineering, University of Trento, Trento, Italy
Alessandro Moschitti
Department of Computer Science, University of Pisa, Pisa, Italy
Maria Simi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Alam, F., Magnini, B., Zanoli, R. (2015). Comparing Named Entity Recognition on Transcriptions and Written Texts. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project. Studies in Computational Intelligence, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-319-14206-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-14206-7_4
Published: 15 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14205-0
Online ISBN: 978-3-319-14206-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics