Skip to main content

Comparing Named Entity Recognition on Transcriptions and Written Texts

  • Chapter
  • First Online:
  • 432 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 589))

Abstract

The ability to recognize named entities (e.g., person, location and organization names) in texts has been proved as an important task for several natural language processing areas, including Information Retrieval and Information Extraction. However, despite the efforts and the achievements obtained in Named Entity Recognition from written texts, the problem of recognizing named entities from automatic transcriptions of spoken documents is still far from being solved. In fact, the output of Automatic Speech Recognition (ASR) often contains transcription errors; in addition, many named entities are out-of-vocabulary words, which makes them not available to the ASR. This paper presents a comparative analysis of extracting named entities both from written texts and from transcriptions. As for transcriptions, we have used spoken broadcast news, while for written texts we have used both newspapers of the same domain of the transcriptions and the manual transcriptions of the broadcast news. The comparison was carried on a number of experiments using the best Named Entity Recognition system presented at Evalita 2007.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.evalita.it/.

  2. 2.

    http://textpro.fbk.eu/.

  3. 3.

    http://gate.ac.uk/.

  4. 4.

    http://opennlp.apache.org/.

  5. 5.

    http://www.itl.nist.gov/iad/mig/tests/bnr/1998/.

  6. 6.

    http://chasen.org/~taku/software/yamcha/.

  7. 7.

    http://www.clips.ua.ac.be/conll2002/ner/bin/conlleval.txt.

References

  1. Alam, F.: Named entity recognition on transcription using cascaded classifiers. In: Working Notes of EVALITA 2011 (2012)

    Google Scholar 

  2. Bartalesi Lenzi, V., Speranza, M., Sprugnoli, R.: Named entity recognition on transcribed broadcast news-guidelines for participants (2011)

    Google Scholar 

  3. Black, W.J., Rinaldi, F., Mowatt, D.: Facile: description of the NE system used for MUC-7. In: Proceedings of the 7th Message Understanding Conference (1998)

    Google Scholar 

  4. Chowdhury, M.F.M.: A simple yet effective approach for named entity recognition from transcribed broadcast news. In: Evaluation of Natural Language and Speech Tools for Italian, pp. 98–106. Springer, Berlin (2013)

    Google Scholar 

  5. Coates-Stephens, S.: The analysis and acquisition of proper names for the understanding of free text. Comput. Humanit. 26(5–6), 441–456 (1992)

    Article  Google Scholar 

  6. Cohen, W.W., Sarawagi, S.: Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods. In: Proceedings of the 10th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pp. 89–98. ACM (2004)

    Google Scholar 

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  8. Falavigna, D., Giuliani, D., Gretter, R., Lööf, J., Gollan, C., Schlüter, R., Ney, H.: Automatic transcription of courtroom recordings in the JUMAS project. In: 2nd International Conference on ICT Solutions for Justice, pp. 65–72. Skopje, Macedonia (2009)

    Google Scholar 

  9. Favre, B., Béchet, F., Nocéra, P.: Robust named entity extraction from large spoken archives. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 491–498 (2005)

    Google Scholar 

  10. Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: IJCNLP, pp. 518–526 (2011)

    Google Scholar 

  11. Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of french radio broadcasts. In: Interspeech, vol. 9. pp. 2583–2586 (2009)

    Google Scholar 

  12. Gravier, G., Adda, G.: Evaluation plan ETAPE 2011. http://www.afcp-parole.org/etape-en.html (2011)

  13. Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996)

    Google Scholar 

  14. Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pp. 287–292. Citeseer (1998)

    Google Scholar 

  15. Lenzi, V.B., Speranza, M., Sprugnoli, R.: Named entity recognition on transcribed broadcast news at Evalita 2011. In: Evaluation of Natural Language and Speech Tools for Italian, pp. 86–97. Springer (2013)

    Google Scholar 

  16. Magnini, B., Pianta, E., Girardi, C., Negri, M., Romano, L., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R.: I-CAB: the Italian content annotation bank. In: Proceedings of LREC, pp. 963–968 (2006)

    Google Scholar 

  17. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvist. Investig. 30(1), 3–26 (2007)

    Article  Google Scholar 

  18. Nguyen, T.V.T., Moschitti, A.: Structural reranking models for named entity recognition. Intell. Artif. 6(2), 177–190 (2012)

    Google Scholar 

  19. Palmer, D.D., Burger, J.D., Ostendorf, M.: Information extraction from broadcast news speech data. In: Proceedings of the DARPA Broadcast News Workshop, pp. 41–46. Citeseer (1999)

    Google Scholar 

  20. Pianta, E., Girardi, C., Zanoli, R.: The textpro tool suite. In: LREC (2008)

    Google Scholar 

  21. Przybocki, M.A., Fiscus, J.G., Garofolo, J.S., Pallett, D.S.: 1998 hub-4 information extraction evaluation. In: Proceedings of DARPA Broadcast News Workshop, (Herndon, Va, USA), pp. 13–18 (1999)

    Google Scholar 

  22. Raghavan, H., Allan, J.: Using soundex codes for indexing names in ASR documents. In: Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004, Association for Computational Linguistics, pp. 22–27 (2004)

    Google Scholar 

  23. Robinson, P., Brown, E., Burger, J., Chinchor, N., Douthat, A., Ferro, L., Hirschman, L.: Overview: information extraction from broadcast news. In: Proceedings of DARPA Broadcast News Workshop, pp. 27–30 (1999)

    Google Scholar 

  24. Sandrini, V., Federico, M.: Spoken information extraction from Italian broadcast news. Springer (2003)

    Google Scholar 

  25. Speranza, M.: Evalita 2007: the named entity recognition task. In: Proceedings of EVALITA (2007)

    Google Scholar 

  26. Speranza, M.: The named entity recognition task at Evalita 2009. In: Proceedings of the Workshop Evalita (2009)

    Google Scholar 

  27. Srihari, R.K., Niu, C., Li, W., Ding, J.: A case restoration approach to named entity tagging in degraded documents. In: 2013 12th International Conference on Document Analysis and Recognition, vol. 2, pp. 720–720. IEEE Computer Society (2003)

    Google Scholar 

  28. Turmo, J., Comas, P., Rosset, S., Galibert, O., Moreau, N., Mostefa, D., Rosso, P., Buscaldi, D.: Overview of QAST 2009-question answering on speech transcriptions. In: CLEF 2009 Workshop (2009)

    Google Scholar 

  29. Zanoli, R., Pianta, E.: Intelligenza artificiale—numero speciale su strumenti per l’elaborazione del linguaggio naturale per l’italiano. In: Associazione Italiana per l’Intelligenza Artificiale, vol. 4, pp. 69–70 (2007)

    Google Scholar 

  30. Zanoli, R., Pianta, E., Giuliano, C.: Named entity recognition through redundancy driven classifiers. In: Proceedings of Evalita 9 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Firoj Alam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Alam, F., Magnini, B., Zanoli, R. (2015). Comparing Named Entity Recognition on Transcriptions and Written Texts. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project. Studies in Computational Intelligence, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-319-14206-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14206-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14205-0

  • Online ISBN: 978-3-319-14206-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics