Improving Speech-to-Text Summarization by Using Additional Information Sources

Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Speech-to-text summarization systems usually take as input the output of an automatic speech recognition (ASR) system that is affected by issues like speech recognition errors, disfluencies, or difficulties in the accurate identification of sentence boundaries. We describe the inclusion of related, solid background information to cope with the difficulties of summarizing spoken language and the use of multi-document summarization techniques in single document speech-to-text summarization. In this work, we explore the possibilities offered by phonetic information to select the background information and conduct a perceptual evaluation to better assess the relevance of the inclusion of that information. Results show that summaries generated using this approach are considerably better than those produced by an up-to-date latent semantic analysis (LSA) summarization method and suggest that humans prefer summaries restricted to the information conveyed in the input source.

Keywords

Automatic Speech Recognition News Story Latent Semantic Analysis Input Source Word Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

We would like to thank Fernando Batista for his help with the speech corpus; Joana Paulo Pardal for her help with the web evaluation form; and, all the human judges for their invaluable contribution. We would also like to thank the insightful comments of the anonymous reviewers.

This work was partially supported by FCT (INESC-ID multiannual funding) through the PIDDAC Program funds.

References

  1. 1.
    Amaral, R., Trancoso, I.: Improving the topic indexation and segmentation modules of a media watch system. In: Proceedings of the 8th International Conference on Spoken Language Processing (INTERSPEECH 2004 – ICSLP), Jeju Island (2004)Google Scholar
  2. 2.
    Amaral, R., Meinedo, H., Caseiro, D., Trancoso, I., Neto, J.P.: Automatic vs. manual topic segmentation and indexation in broadcast news. In: Proceedings of the IV Jornadas en Tecnologia del Habla, Saragoza (2006)Google Scholar
  3. 3.
    Amaral, R., Meinedo, H., Caseiro, D., Trancoso, I., Neto, J.P.: A prototype system for selective dissemination of broadcast news in European Portuguese. EURASIP J. Adv. Signal Process. 2007, 037507 (2007)Google Scholar
  4. 4.
    Batista, F., Caseiro, D., Mamede, N.J., Trancoso, I.: Recovering punctuation marks for automatic speech recognition. In: Proceedings of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH 2007), Antwerp, pp. 2153–2156. ISCA (2007)Google Scholar
  5. 5.
    Batista, F., Mamede, N.J., Trancoso, I.: The impact of language dynamics on the capitalization of broadcast news. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008), Brisbane, pp. 220–223. ISCA (2008)Google Scholar
  6. 6.
    Charniak, E., Johnson, M.: Edit detection and parsing for transcribed speech. In: Proceedings of the 2nd Conference of the North American Chapter of the ACL, Pittsburgh, pp. 1–9. Association for Computational Linguistics (2001)Google Scholar
  7. 7.
    Chatain, P., Whittaker, E.W.D., Mrozinski, J.A., Furui, S.: Topic and stylistic adaptation for speech summarisation. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, pp. 977–980. IEEE (2006)Google Scholar
  8. 8.
    Chen, Y.T., Chen, B., Wang, H.M.: A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization. IEEE Trans. Audio Speech Lang. Process. 17(1), 95–106 (2009)Google Scholar
  9. 9.
    Christensen, H., Gotoh, Y., Kolluru, B., Renals, S.: Are extractive text summarisation techniques portable to broadcast news? In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’03), St. Thomas, pp. 489–494. IEEE (2003)Google Scholar
  10. 10.
    Edmundson, H.P.: New methods in automatic abstracting. J. Assoc. Comput. Mach. 16(2), 264–285 (1969)Google Scholar
  11. 11.
    Endres-Niggemeyer, B.: Summarizing Information. Springer, Berlin (1998)Google Scholar
  12. 12.
    Endres-Niggemeyer, B.: Human-style WWW summarization. Tech. rep., University for Applied Sciences, Department of Information and Communication (2000)Google Scholar
  13. 13.
    Endres-Niggemeyer, B., Hobbs, J.R., Spärck Jones, K. (eds.): Summarizing Text for Intelligent Communication. Dagstuhl-Seminar-Report, vol. 79. IBFI, Wadern (1995)Google Scholar
  14. 14.
    Fleiss, J.L., Levin, B., Paik, M.C.: The measurement of interrater agreement. In: Statistical Methods for Rates and Proportions. Wiley Series in Probability and Statistics, 3rd edn., pp. 598–626. John Wiley & Sons, Inc., Hoboken, NJ, USA (2004)Google Scholar
  15. 15.
    Furui, S.: Recent advances in automatic speech summarization. In: Proceedings of the 8th Conference on Recherche d’Information Assistée par Ordinateur (RIAO), Pittsburgh. Centre des Hautes Études Internationales d’Informatique Documentaire (2007)Google Scholar
  16. 16.
    Golub, G.H., van Loan, C.F.: Matrix analysis. Matrix Computations. Johns Hopkins Series in the Mathematical Sciences 3rd edn., pp. 48–86. The Johns Hopkins University Press, Baltimore (1996)Google Scholar
  17. 17.
    Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: SIGIR 2001: Proceedings of the 24st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, pp. 19–25. ACM (2001)Google Scholar
  18. 18.
    Hirohata, M., Shinnaka, Y., Iwano, K., Furui, S.: Sentence-extractive automatic speech summarization and evaluation techniques. Speech Commun. 48, 1151–1161 (2006)Google Scholar
  19. 19.
    Hori, T., Hori, C., Minami, Y.: Speech summarization using weighted finite-state transducers. In: Proceedings of the 8th EUROSPEECH – INTERSPEECH 2003, Geneva, pp. 2817–2820. ISCA (2003)Google Scholar
  20. 20.
    Hovy, E.: Text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford University Press, Oxford/New York (2003)Google Scholar
  21. 21.
    Kessler, B.: Phonetic comparison algorithms. Trans. Philol. Soc. 103(2), 243–260 (2005)Google Scholar
  22. 22.
    Kikuchi, T., Furui, S., Hori, C.: Two-stage automatic speech summarization by sentence extraction and compaction. In: Proceedings of the ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR-2003), Tokyo, pp. 207–210. ISCA (2003)Google Scholar
  23. 23.
    Krippendorff, K.: Reliability. Content Analysis: An Introduction to Its Methodology, 2nd edn., pp. 211–256. Sage Publications, Thousand Oaks (2004)Google Scholar
  24. 24.
    Landis, J.R., Kosh, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)Google Scholar
  25. 25.
    Lavrenko, V., Croft, W.B.: Relevance models in information retrieval. In: Croft, W.B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. The Information Retrieval Series, vol. 13. Kluwer Academic Publishers, Dordrecht, The Netherlands (2003)Google Scholar
  26. 26.
    Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Moens, M.F., Szpakowicz S. (eds.) Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, pp. 74–81. Association for Computational Linguistics, East Stroudsburg (2004)Google Scholar
  27. 27.
    Lin, S.H., Chen, B.: A risk minimization framework for extractive speech summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, pp. 79–87. Association for Computational Linguistics (2010)Google Scholar
  28. 28.
    Lin, S.H., Yeh, Y.M., Chen, B.: Extractive speech summarization – from the view of decision theory. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Chiba, pp. 1684–1687. ISCA (2010)Google Scholar
  29. 29.
    Liu, F., Liu, Y.: Using spoken utterance compression for meeting summarization: a pilot study. In: 2010 IEEE Workshop on Spoken Language Technology, Berkeley, pp. 37–42 (2010)Google Scholar
  30. 30.
    Liu, Y., Xie, S.: Impact of automatic sentence segmentation on meeting summarization. In: 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, pp. 5009–5012. IEEE (2008)Google Scholar
  31. 31.
    Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Harper, M.: Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Speech Audio Process. 14(5), 1526–1540 (2006)Google Scholar
  32. 32.
    Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)Google Scholar
  33. 33.
    Maskey, S.R., Hirschberg, J.: Comparing lexical, acoustic/prosodic, strucural and discourse features for speech summarization. In: Proceedings of the 9th EUROSPEECH – INTERSPEECH 2005, Lisbon (2005)Google Scholar
  34. 34.
    Maskey, S.R., Rosenberg, A., Hirschberg, J.: Intonational phrases for speech summarization. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008), Brisbane, pp. 2430–2433. ISCA (2008)Google Scholar
  35. 35.
    McKeown, K.R., Radev, D.: Generating summaries of multiple news articles. In: Fox, E.A., Ingwersen, P., Fidel R. (eds.) SIGIR 1995: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, pp. 74–82. ACM (1995)Google Scholar
  36. 36.
    McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B., Sigelman, S.: Tracking and summarizing news on a daily basis with Columbia’s newsblaster. In: Marcus, M. (ed.) Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002), San Diego, pp. 280–285. Morgan Kaufmann (2002)Google Scholar
  37. 37.
    McKeown, K.R., Hirschberg, J., Galley, M., Maskey, S.R.: From text to speech summarization. In: Proceedings of 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, Pennsylvania, vol. V, pp. 997–1000. IEEE (2005)Google Scholar
  38. 38.
    Meinedo, H., Souto, N., Neto, J.P.: Speech recognition of broadcast news for the european portuguese language. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), Madonna di Campiglio. IEEE (2001)Google Scholar
  39. 39.
    Meinedo, H., Caseiro, D., Neto, J.P., Trancoso, I.: AUDIMUS. Media: a broadcast news speech recognition system for the European Portuguese language. In: Computational Processing of the Portuguese Language: 6th International Workshop, PROPOR 2003, Faro, 26–27 June 2003. Proceedings. Lecture Notes in Computer Science (Subseries LNAI), vol. 2721, pp. 9–17. Springer (2003)Google Scholar
  40. 40.
    Meinedo, H., Viveiros, M., Neto, J.P.: Evaluation of a live broadcast news subtitling system for portuguese. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008), Brisbane, pp. 508–511. ISCA (2008)Google Scholar
  41. 41.
    Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23(2), 269–311 (1997)Google Scholar
  42. 42.
    Murray, G., Renals, S., Carletta, J.: Extractive summarization of meeting records. In: Proceedings of the 9th EUROSPEECH – INTERSPEECH 2005, Lisbon (2005)Google Scholar
  43. 43.
    Murray, G., Renals, S., Carletta, J., Moore, J.: Incorporating speaker and discourse features into speech summarization. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, New York, pp. 367–374. Association for Computational Linguistics (2006)Google Scholar
  44. 44.
    Nenkova, A.: Summarization evaluation for text and speech: issues and approaches. In: Proceedings of INTERSPEECH 2006 – ICSLP, Pittsburgh, pp. 1527–1530. ISCA (2006)Google Scholar
  45. 45.
    Ostendorf, M., Favre, B., Grishman, R., Hakkani-Tür, D., Harper, M., Hillard, D., Hirschberg, J., Ji, H., Kahn, J.G., Liu, Y., Maskey, S., Matusov, E., Ney, H., Rosenberg, A., Shriberg, E., Wang, W., Wooters, C.: Speech segmentation and spoken document processing. IEEE Signal Process. Mag. 25(3), 59–69 (2008)Google Scholar
  46. 46.
    Paulo, S., Oliveira, L.C.: Multilevel annotation Of speech signals using weighted finite state transducers. In: Proceedings of the 2002 IEEE Workshop on Speech Synthesis, Santa Monica, pp. 111–114. IEEE (2002)Google Scholar
  47. 47.
    Penn, G., Zhu, X.: A critical reassessment of evaluation baselines for speech summarization. In: Proceeding of ACL-08: HLT, Columbus, pp. 470–478. Association for Computational Linguistics (2008)Google Scholar
  48. 48.
    Radev, D.R., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: NewsInEssence: summarizing online news topics. Commun. ACM 48(10), 95–98 (2005)Google Scholar
  49. 49.
    Ribeiro, R., de Matos, D.M.: Extractive summarization of broadcast news: comparing strategies for European Portuguese. In: Matoušek, V., Mautner, P. (eds.) Text, Speech and Dialogue – 10th International Conference, TSD 2007, Pilsen, 3–7 September 2007. Proceedings. Lecture Notes in Computer Science (Subseries LNAI), vol. 4629, pp. 115–122. Springer (2007)Google Scholar
  50. 50.
    Ribeiro, R., de Matos, D.M.: Mixed-source multi-document speech-to-text summarization. In: Coling 2008: Proceedings of the 2nd workshop on Multi-source Multilingual Information Extraction and Summarization, Manchester, pp. 33–40. Coling 2008 Organizing Committee (2008)Google Scholar
  51. 51.
    Ribeiro, R., de Matos, D.M.: Using prior knowledge to assess relevance in speech summarization. In: 2008 IEEE Workshop on Spoken Language Technology, Holiday Inn Goa, pp. 169–172. IEEE (2008)Google Scholar
  52. 52.
    Spärck Jones, K.: Automatic summarising: the state of the art. Inf. Process. Manag. 43, 1449–1481 (2007)Google Scholar
  53. 53.
    Wan, X., Yang, J., Xiao, J.: CollabSum: exploiting multiple document clustering for collaborative single document summarizations. In: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, pp. 143–150. ACM (2007)Google Scholar
  54. 54.
    Zechner, K., Waibel, A.: Minimizing word error rate in textual summaries of spoken language. In: Proceedings of the 1st conference of the North American chapter of the ACL, Seattle, Washington, USA, pp. 186–193. Morgan Kaufmann (2000)Google Scholar
  55. 55.
    Zhang, J.J., Chan, R.H.Y., Fung, P.: Extractive speech summarization using shallow rhetorical structure modeling. IEEE Trans. Audio Speech Lang. Process. 18(6), 1147–1157 (2010)Google Scholar
  56. 56.
    Zhu, X., Penn, G.: Summarization of spontaneous conversations. In: Proceedings of INTERSPEECH 2006 – ICSLP, Pittsburgh, pp. 1531–1534. ISCA (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.L2F – INESC ID/ISCTE – Instituto Universitário de LisboaLisboaPortugal
  2. 2.L2F – INESC ID/ISTLisboaPortugal

Personalised recommendations