Text Editing for Lecture Speech Archiving on the Web

  • Masashi Ito
  • Tomohiro Ohno
  • Shigeki Matsubara
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5459)


It is very significant in the knowledge society to accumulate spoken documents on the web. However, because of the high redundancy of spontaneous speech, the transcribed text in itself is not readable on an Internet browser, and therefore not suitable as a web document. This paper proposes a technique for converting spoken documents into web documents for the purpose of building a speech archiving system. The technique edits automatically transcribed texts and improves its readability on the browser. The readable text can be generated by applying technology such as paraphrasing, segmentation and structuring to the transcribed texts. An edit experiment using lecture data showed the feasibility of the technique. A prototype system of spoken document archiving was implemented to confirm its effectiveness.


spoken language processing digital archiving web contents paraphrasing sentence segmentation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bain, K., Basson, S., Faisman, A., Kanevsky, D.: Accessibility, transcription, and access everywhere. IBM System Journal 44(3), 589–603 (2005)CrossRefGoogle Scholar
  2. 2.
    Shibata, T., Kurohashi, S.: Automatic Slide Generation Based on Discourse Structure Analysis. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS, vol. 3651, pp. 754–766. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Chatain, P., Whittaker, E.W.D., Mmzinski, J.A., Furui, S.: Topic and Stylistic Adaptation for Speech Summarisation. In: Proc. IEEE ICASSP (2006)Google Scholar
  4. 4.
    James, C., Mirella, L.: Models for Sentence Compression: a Comparison Across Domains, Training Requirements and Evaluation Measures. In: Proc. ACL/COLING 2006, pp. 377–384 (2006)Google Scholar
  5. 5.
    Zhu, X., Penn, G.: Summarization of Spontaneous Conversations. In: Proc. 9th ICSLP, pp. 1531–1534 (2006)Google Scholar
  6. 6.
    Murray, G., Renals, S., Carletta, J., Moore, J.: Incorporating Speaker and Discourse Features into Speech Summarization. In: Proc. HLT, pp. 367–374 (2006)Google Scholar
  7. 7.
    Shitaoka, K., Nanjo, H., Kawahara, T.: Automatic Transformation of Lecture Transcription into Document Style Using Statistical Framework. In: Proc. 8th ICSLP, pp. 2169–2172 (2004)Google Scholar
  8. 8.
    Matsubara, S., Takagi, A., Kawaguchi, N., Inagaki, Y.: Bilingual Spoken Monologue Corpus for Simultaneous Machine Interpretation Research. In: Proc. 3rd LREC, pp. 153–159 (2002)Google Scholar
  9. 9.
  10. 10.
  11. 11.
  12. 12.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Masashi Ito
    • 1
  • Tomohiro Ohno
    • 2
  • Shigeki Matsubara
    • 3
  1. 1.Graduate School of Information ScienceNagoya UniversityJapan
  2. 2.Graduate School of International DevelopmentNagoya UniversityJapan
  3. 3.Information Technology CenterNagoya UniversityNagoyaJapan

Personalised recommendations