Improving Speech-to-Text Summarization by Using Additional Information Sources
Abstract
Speech-to-text summarization systems usually take as input the output of an automatic speech recognition (ASR) system that is affected by issues like speech recognition errors, disfluencies, or difficulties in the accurate identification of sentence boundaries. We describe the inclusion of related, solid background information to cope with the difficulties of summarizing spoken language and the use of multi-document summarization techniques in single document speech-to-text summarization. In this work, we explore the possibilities offered by phonetic information to select the background information and conduct a perceptual evaluation to better assess the relevance of the inclusion of that information. Results show that summaries generated using this approach are considerably better than those produced by an up-to-date latent semantic analysis (LSA) summarization method and suggest that humans prefer summaries restricted to the information conveyed in the input source.
Keywords
Automatic Speech Recognition News Story Latent Semantic Analysis Input Source Word Error RateNotes
Acknowledgements
We would like to thank Fernando Batista for his help with the speech corpus; Joana Paulo Pardal for her help with the web evaluation form; and, all the human judges for their invaluable contribution. We would also like to thank the insightful comments of the anonymous reviewers.
This work was partially supported by FCT (INESC-ID multiannual funding) through the PIDDAC Program funds.
References
- 1.Amaral, R., Trancoso, I.: Improving the topic indexation and segmentation modules of a media watch system. In: Proceedings of the 8th International Conference on Spoken Language Processing (INTERSPEECH 2004 – ICSLP), Jeju Island (2004)Google Scholar
- 2.Amaral, R., Meinedo, H., Caseiro, D., Trancoso, I., Neto, J.P.: Automatic vs. manual topic segmentation and indexation in broadcast news. In: Proceedings of the IV Jornadas en Tecnologia del Habla, Saragoza (2006)Google Scholar
- 3.Amaral, R., Meinedo, H., Caseiro, D., Trancoso, I., Neto, J.P.: A prototype system for selective dissemination of broadcast news in European Portuguese. EURASIP J. Adv. Signal Process. 2007, 037507 (2007)Google Scholar
- 4.Batista, F., Caseiro, D., Mamede, N.J., Trancoso, I.: Recovering punctuation marks for automatic speech recognition. In: Proceedings of the 8th Annual Conference of the International Speech Communication Association (INTERSPEECH 2007), Antwerp, pp. 2153–2156. ISCA (2007)Google Scholar
- 5.Batista, F., Mamede, N.J., Trancoso, I.: The impact of language dynamics on the capitalization of broadcast news. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008), Brisbane, pp. 220–223. ISCA (2008)Google Scholar
- 6.Charniak, E., Johnson, M.: Edit detection and parsing for transcribed speech. In: Proceedings of the 2nd Conference of the North American Chapter of the ACL, Pittsburgh, pp. 1–9. Association for Computational Linguistics (2001)Google Scholar
- 7.Chatain, P., Whittaker, E.W.D., Mrozinski, J.A., Furui, S.: Topic and stylistic adaptation for speech summarisation. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, pp. 977–980. IEEE (2006)Google Scholar
- 8.Chen, Y.T., Chen, B., Wang, H.M.: A Probabilistic Generative Framework for Extractive Broadcast News Speech Summarization. IEEE Trans. Audio Speech Lang. Process. 17(1), 95–106 (2009)Google Scholar
- 9.Christensen, H., Gotoh, Y., Kolluru, B., Renals, S.: Are extractive text summarisation techniques portable to broadcast news? In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’03), St. Thomas, pp. 489–494. IEEE (2003)Google Scholar
- 10.Edmundson, H.P.: New methods in automatic abstracting. J. Assoc. Comput. Mach. 16(2), 264–285 (1969)Google Scholar
- 11.Endres-Niggemeyer, B.: Summarizing Information. Springer, Berlin (1998)Google Scholar
- 12.Endres-Niggemeyer, B.: Human-style WWW summarization. Tech. rep., University for Applied Sciences, Department of Information and Communication (2000)Google Scholar
- 13.Endres-Niggemeyer, B., Hobbs, J.R., Spärck Jones, K. (eds.): Summarizing Text for Intelligent Communication. Dagstuhl-Seminar-Report, vol. 79. IBFI, Wadern (1995)Google Scholar
- 14.Fleiss, J.L., Levin, B., Paik, M.C.: The measurement of interrater agreement. In: Statistical Methods for Rates and Proportions. Wiley Series in Probability and Statistics, 3rd edn., pp. 598–626. John Wiley & Sons, Inc., Hoboken, NJ, USA (2004)Google Scholar
- 15.Furui, S.: Recent advances in automatic speech summarization. In: Proceedings of the 8th Conference on Recherche d’Information Assistée par Ordinateur (RIAO), Pittsburgh. Centre des Hautes Études Internationales d’Informatique Documentaire (2007)Google Scholar
- 16.Golub, G.H., van Loan, C.F.: Matrix analysis. Matrix Computations. Johns Hopkins Series in the Mathematical Sciences 3rd edn., pp. 48–86. The Johns Hopkins University Press, Baltimore (1996)Google Scholar
- 17.Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: SIGIR 2001: Proceedings of the 24st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, pp. 19–25. ACM (2001)Google Scholar
- 18.Hirohata, M., Shinnaka, Y., Iwano, K., Furui, S.: Sentence-extractive automatic speech summarization and evaluation techniques. Speech Commun. 48, 1151–1161 (2006)Google Scholar
- 19.Hori, T., Hori, C., Minami, Y.: Speech summarization using weighted finite-state transducers. In: Proceedings of the 8th EUROSPEECH – INTERSPEECH 2003, Geneva, pp. 2817–2820. ISCA (2003)Google Scholar
- 20.Hovy, E.: Text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford University Press, Oxford/New York (2003)Google Scholar
- 21.Kessler, B.: Phonetic comparison algorithms. Trans. Philol. Soc. 103(2), 243–260 (2005)Google Scholar
- 22.Kikuchi, T., Furui, S., Hori, C.: Two-stage automatic speech summarization by sentence extraction and compaction. In: Proceedings of the ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR-2003), Tokyo, pp. 207–210. ISCA (2003)Google Scholar
- 23.Krippendorff, K.: Reliability. Content Analysis: An Introduction to Its Methodology, 2nd edn., pp. 211–256. Sage Publications, Thousand Oaks (2004)Google Scholar
- 24.Landis, J.R., Kosh, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)Google Scholar
- 25.Lavrenko, V., Croft, W.B.: Relevance models in information retrieval. In: Croft, W.B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. The Information Retrieval Series, vol. 13. Kluwer Academic Publishers, Dordrecht, The Netherlands (2003)Google Scholar
- 26.Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Moens, M.F., Szpakowicz S. (eds.) Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, pp. 74–81. Association for Computational Linguistics, East Stroudsburg (2004)Google Scholar
- 27.Lin, S.H., Chen, B.: A risk minimization framework for extractive speech summarization. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, pp. 79–87. Association for Computational Linguistics (2010)Google Scholar
- 28.Lin, S.H., Yeh, Y.M., Chen, B.: Extractive speech summarization – from the view of decision theory. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Chiba, pp. 1684–1687. ISCA (2010)Google Scholar
- 29.Liu, F., Liu, Y.: Using spoken utterance compression for meeting summarization: a pilot study. In: 2010 IEEE Workshop on Spoken Language Technology, Berkeley, pp. 37–42 (2010)Google Scholar
- 30.Liu, Y., Xie, S.: Impact of automatic sentence segmentation on meeting summarization. In: 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, pp. 5009–5012. IEEE (2008)Google Scholar
- 31.Liu, Y., Shriberg, E., Stolcke, A., Hillard, D., Ostendorf, M., Harper, M.: Enriching speech recognition with automatic detection of sentence boundaries and disfluencies. IEEE Trans. Speech Audio Process. 14(5), 1526–1540 (2006)Google Scholar
- 32.Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)Google Scholar
- 33.Maskey, S.R., Hirschberg, J.: Comparing lexical, acoustic/prosodic, strucural and discourse features for speech summarization. In: Proceedings of the 9th EUROSPEECH – INTERSPEECH 2005, Lisbon (2005)Google Scholar
- 34.Maskey, S.R., Rosenberg, A., Hirschberg, J.: Intonational phrases for speech summarization. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008), Brisbane, pp. 2430–2433. ISCA (2008)Google Scholar
- 35.McKeown, K.R., Radev, D.: Generating summaries of multiple news articles. In: Fox, E.A., Ingwersen, P., Fidel R. (eds.) SIGIR 1995: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, pp. 74–82. ACM (1995)Google Scholar
- 36.McKeown, K.R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J.L., Nenkova, A., Sable, C., Schiffman, B., Sigelman, S.: Tracking and summarizing news on a daily basis with Columbia’s newsblaster. In: Marcus, M. (ed.) Proceedings of the Second International Conference on Human Language Technology Research (HLT 2002), San Diego, pp. 280–285. Morgan Kaufmann (2002)Google Scholar
- 37.McKeown, K.R., Hirschberg, J., Galley, M., Maskey, S.R.: From text to speech summarization. In: Proceedings of 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, Pennsylvania, vol. V, pp. 997–1000. IEEE (2005)Google Scholar
- 38.Meinedo, H., Souto, N., Neto, J.P.: Speech recognition of broadcast news for the european portuguese language. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), Madonna di Campiglio. IEEE (2001)Google Scholar
- 39.Meinedo, H., Caseiro, D., Neto, J.P., Trancoso, I.: AUDIMUS. Media: a broadcast news speech recognition system for the European Portuguese language. In: Computational Processing of the Portuguese Language: 6th International Workshop, PROPOR 2003, Faro, 26–27 June 2003. Proceedings. Lecture Notes in Computer Science (Subseries LNAI), vol. 2721, pp. 9–17. Springer (2003)Google Scholar
- 40.Meinedo, H., Viveiros, M., Neto, J.P.: Evaluation of a live broadcast news subtitling system for portuguese. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008), Brisbane, pp. 508–511. ISCA (2008)Google Scholar
- 41.Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23(2), 269–311 (1997)Google Scholar
- 42.Murray, G., Renals, S., Carletta, J.: Extractive summarization of meeting records. In: Proceedings of the 9th EUROSPEECH – INTERSPEECH 2005, Lisbon (2005)Google Scholar
- 43.Murray, G., Renals, S., Carletta, J., Moore, J.: Incorporating speaker and discourse features into speech summarization. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, New York, pp. 367–374. Association for Computational Linguistics (2006)Google Scholar
- 44.Nenkova, A.: Summarization evaluation for text and speech: issues and approaches. In: Proceedings of INTERSPEECH 2006 – ICSLP, Pittsburgh, pp. 1527–1530. ISCA (2006)Google Scholar
- 45.Ostendorf, M., Favre, B., Grishman, R., Hakkani-Tür, D., Harper, M., Hillard, D., Hirschberg, J., Ji, H., Kahn, J.G., Liu, Y., Maskey, S., Matusov, E., Ney, H., Rosenberg, A., Shriberg, E., Wang, W., Wooters, C.: Speech segmentation and spoken document processing. IEEE Signal Process. Mag. 25(3), 59–69 (2008)Google Scholar
- 46.Paulo, S., Oliveira, L.C.: Multilevel annotation Of speech signals using weighted finite state transducers. In: Proceedings of the 2002 IEEE Workshop on Speech Synthesis, Santa Monica, pp. 111–114. IEEE (2002)Google Scholar
- 47.Penn, G., Zhu, X.: A critical reassessment of evaluation baselines for speech summarization. In: Proceeding of ACL-08: HLT, Columbus, pp. 470–478. Association for Computational Linguistics (2008)Google Scholar
- 48.Radev, D.R., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: NewsInEssence: summarizing online news topics. Commun. ACM 48(10), 95–98 (2005)Google Scholar
- 49.Ribeiro, R., de Matos, D.M.: Extractive summarization of broadcast news: comparing strategies for European Portuguese. In: Matoušek, V., Mautner, P. (eds.) Text, Speech and Dialogue – 10th International Conference, TSD 2007, Pilsen, 3–7 September 2007. Proceedings. Lecture Notes in Computer Science (Subseries LNAI), vol. 4629, pp. 115–122. Springer (2007)Google Scholar
- 50.Ribeiro, R., de Matos, D.M.: Mixed-source multi-document speech-to-text summarization. In: Coling 2008: Proceedings of the 2nd workshop on Multi-source Multilingual Information Extraction and Summarization, Manchester, pp. 33–40. Coling 2008 Organizing Committee (2008)Google Scholar
- 51.Ribeiro, R., de Matos, D.M.: Using prior knowledge to assess relevance in speech summarization. In: 2008 IEEE Workshop on Spoken Language Technology, Holiday Inn Goa, pp. 169–172. IEEE (2008)Google Scholar
- 52.Spärck Jones, K.: Automatic summarising: the state of the art. Inf. Process. Manag. 43, 1449–1481 (2007)Google Scholar
- 53.Wan, X., Yang, J., Xiao, J.: CollabSum: exploiting multiple document clustering for collaborative single document summarizations. In: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, pp. 143–150. ACM (2007)Google Scholar
- 54.Zechner, K., Waibel, A.: Minimizing word error rate in textual summaries of spoken language. In: Proceedings of the 1st conference of the North American chapter of the ACL, Seattle, Washington, USA, pp. 186–193. Morgan Kaufmann (2000)Google Scholar
- 55.Zhang, J.J., Chan, R.H.Y., Fung, P.: Extractive speech summarization using shallow rhetorical structure modeling. IEEE Trans. Audio Speech Lang. Process. 18(6), 1147–1157 (2010)Google Scholar
- 56.Zhu, X., Penn, G.: Summarization of spontaneous conversations. In: Proceedings of INTERSPEECH 2006 – ICSLP, Pittsburgh, pp. 1531–1534. ISCA (2006)Google Scholar