A Multi-layered Summarization System for Multi-media Archives by Understanding and Structuring of Chinese Spoken Documents

  • Lin-shan Lee
  • Sheng-yi Kong
  • Yi-cheng Pan
  • Yi-sheng Fu
  • Yu-tsun Huang
  • Chien-Chih Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)


The multi-media archives are very difficult to be shown on the screen, and very difficult to retrieve and browse. It is therefore important to develop technologies to summarize the entire archives in the network content to help the user in browsing and retrieval. In a recent paper [1] we proposed a complete set of multi-layered technologies to handle at least some of the above issues: (1) Automatic Generation of Titles and Summaries for each of the spoken documents, such that the spoken documents become much more easier to browse, (2) Global Semantic Structuring of the entire spoken document archive, offering to the user a global picture of the semantic structure of the archive, and (3) Query-based Local Semantic Structuring for the subset of the spoken documents retrieved by the user’s query, providing the user the detailed semantic structure of the relevant spoken documents given the query he entered. The Probabilistic Latent Semantic Analysis (PLSA) is found to be helpful. This paper presents an initial prototype system for Chinese archives with the functions mentioned above, in which the broadcast news archive in Mandarin Chinese is taken as the example archive.


Automatic Generation News Story Semantic Structure Latent Topic Spontaneous Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lee, L.-S., Kong, S.-Y., Pan, Y.-C., Fu, Y.-S., Huang, Y.-T.: Multi-layered summarization of spoken document archives by information extraction and semantic structuring. In: Interspeech (2006) (to appear)Google Scholar
  2. 2.
    Lee, L.-S., Chen, B.: Spoken document understanding and organization. IEEE Signal Processing Magazine 22(5) (September 2005)Google Scholar
  3. 3.
    CMU Informedia Digital Video Library project [online]. Available:
  4. 4.
    Multimedia Document Retrieval project at Cambridge University [online]. Available:
  5. 5.
    Miller, D.R.H., Leek, T., Schwartz, R.: Speech and language technologies for audio indexing and retrieval. Proc. IEEE 88(8), 1338–1353 (2000)CrossRefGoogle Scholar
  6. 6.
    Whittaker, S., Hirschberg, J., Choi, J., Hindle, D., Pereira, F., Singhal, A.: Scan: Designing and evaluating user interface to support retrieval from speech archives. In: Proc. ACM SIGIR Conf. R&D in Information Retrieval, pp. 26–33 (1999)Google Scholar
  7. 7.
    Merlino, A., Maybury, M.: An empirical study of the optimal presentation of multimedia summaries of broadcast news. In: Mani, I., Maybury, M. (eds.) Automated Text Summarization, pp. 391–401. MIT Press, Cambridge (1999)Google Scholar
  8. 8.
    SpeechBot Audio/Video Search at Hewlett-Packard (HP) Labs [online]. Available:
  9. 9.
    Furui, S.: Recent advances in spontaneous speech recognition and understanding. In: Proc. ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition, pp. 1–6 (2003)Google Scholar
  10. 10.
    Columbia Newsblaster project at Columbia University [online]. Available:
  11. 11.
    Hofmann, T.: Probabilistic latent semantic analysis. Uncertainty in Artificial Intelligence (1999)Google Scholar
  12. 12.
    Jin, R., Hauptmann, A.: Automatic title generation for spoken broadcase news. In: Proc. of HLT, pp. 1–3 (2001)Google Scholar
  13. 13.
    Banko, M., Mittal, V., Witbrock, M.: Headline generation based on statistical translation. In: Proc. of ACL, pp. 318–325 (2000)Google Scholar
  14. 14.
    Dorr, B., Zajic, D., Schwartz, R.: Hedge trimmer: A parse-and-trim approach to headline generation. In: Proc. of HLT-NAACL, vol. 5, pp. 1–8 (2003)Google Scholar
  15. 15.
    Furui, S., Kikuchi, T., Shinnaka, Y., Hori, C.: Speech-to-text and speech-tospeech summarization of spontaneous speech. IEEE Trans. on Speech and Audio Processing 12(4), 401–408 (2004)CrossRefGoogle Scholar
  16. 16.
    Hirohata, M., Shinnaka, Y., Iwano, K., Furui, S.: Sentence extraction-based presentation summarization techniques and evaluation metrics. In: Proc. ICASSP, pp. SP–P16.14 (2005)Google Scholar
  17. 17.
    Kong, S.-Y., Lee, L.-S.: Improved spoken document summarization using probabilistic latent semantic analysis (plsa). In: Proc. ICASSP (2006) (to appear)Google Scholar
  18. 18.
    Wang, C.-C.: Improved automatic generation of titles for spoken documents using various scoring techniques. M.S. thesis, National Taiwan Univerisity (2006)Google Scholar
  19. 19.
    Chen, S.-C., Lee, L.-S.: Automatic title generation for chinese spoken documents using an adaptive k-nearest-neighbor approach. In: Proc. European Conf. Speech Communication and Technology, pp. 2813–2816 (2003)Google Scholar
  20. 20.
    Li, T.-H., Lee, M.-H., Chen, B., Lee, L.-S.: Hierarchical topic organization and visual presentation of spoken documents using probabilistic latent semantic analysis (plsa) for efficient retrieval/browsing applications. In: Proc. European Conf. Speech Communication and Technology, pp. 625–628 (2005)Google Scholar
  21. 21.
    Pan, Y.-C., Wang, C.-C., Hsieh, Y.-C., Lee, T.-H., Lee, Y.-S., Fu, Y.-S., Huang, Y.-T., Lee, L.-S.: A multi-modal dialogue system for information navigation and retrieval across spoken document archives with topic hierarchies. In: Proc. of ASRU, pp. 375–380 (2005)Google Scholar
  22. 22.
    Chuang, S.-L., Chien, L.-F.: A pratical web-based approach to generating topic hierarchy for text segments. In: ACM SIGIR, pp. 127–136 (2004)Google Scholar
  23. 23.
    Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Proc. Of Workshop on Text Summarization Branches Out, pp. 74–81 (2004)Google Scholar
  24. 24.
    Kohonen, T., Kaski, S., Lagus, K., Salojvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans on Neural Networks 11(3), 574–585 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Lin-shan Lee
    • 1
  • Sheng-yi Kong
    • 1
  • Yi-cheng Pan
    • 1
  • Yi-sheng Fu
    • 1
  • Yu-tsun Huang
    • 1
  • Chien-Chih Wang
    • 1
  1. 1.Speech LabCollege of EECS National Taiwan UniversityTaipei

Personalised recommendations