Text Summarization and Speech Synthesis for the Automated Generation of Personalized Audio Presentations

  • Séamus LawlessEmail author
  • Peter Lavin
  • Mostafa Bayomi
  • João P. Cabral
  • M. Rami Ghorab
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9103)


In today’s fast-paced world, users face the challenge of having to consume a lot of content in a short time. This situation is exacerbated by the fact that content is scattered in a range of different languages and locations. This research addresses these challenges using a number of natural language processing techniques: adapting content using automatic text summarization; enhancing content accessibility through machine translation; and altering the delivery modality through speech synthesis. This paper introduces Lean-back Learning (LbL), an information system that delivers automatically generated audio presentations for consumption in a “lean-back” fashion, i.e. hands-busy, eyes-busy situations. These presentations are personalized and are generated using multilingual multi-document text summarization. The paper discusses the system’s components and algorithms, in addition to initial system evaluations.


Lean-back learning Text summarization Speech synthesis Multilingual content adaptation, personalization 



This research is supported by the Science Foundation Ireland (grant 07/CE/I1142) as part of the Centre for Next Generation Localisation ( at Trinity College, Dublin.


  1. 1.
    Murray, G., Renals, S., Carletta, J.: Extractive summarization of meeting recordings. In: Proceedings, Interspeech’ 2005 - Eurospeech, 9th European Conference on Speech Communication and Technology. Lisbon, Portugal (2005)Google Scholar
  2. 2.
    Fiszman, M., Rindflesch, T.C.: Abstraction summarization for managing the biomedical research literature. In: Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL) Workshop on Computational Lexical Semantics (CLS), pp. 76–83, Boston, Massachusetts (2004)Google Scholar
  3. 3.
    Vodolazova, T., Lloret, E., Muñoz, R., Palomar, M.: A comparative study of the impact of statistical and semantic features in the framework of extractive text summarization. In: 15th International Conference on Text, Speech Dialogue, (TSD), pp. 306–313, (2012)Google Scholar
  4. 4.
    Nenkova, A., Mckeown, K.R.: Automatic summarization. Found. Trends Inf. Retrieval 5, 103–233 (2011)CrossRefGoogle Scholar
  5. 5.
    Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)Google Scholar
  6. 6.
    Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Lin D., Wu D. (eds) Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Barcelona, Spain, pp. 404–411 (2004)Google Scholar
  7. 7.
    Edmundson, H.P.: New methods in automatic extracting. J. ACM 16(2), 264–285 (1969)zbMATHCrossRefGoogle Scholar
  8. 8.
    Teufel, S., Moens, M.: Sentence extraction as a classification task. In: ACL/EACL workshop on Intelligent and scalable Text summarization, pp. 58–65, Madrid, Spain (1997)Google Scholar
  9. 9.
    Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Black, A., Zen, H., Tokuda, K.: Statistical parametric speech synthesis. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), pp. 1229–1232 (2007)Google Scholar
  11. 11.
    Ling, Z., Wang, R.: HMM-based hierarchical unit selection combining kullback-leibler divergence with likelihood criterion. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), pp. 1245–1248 (2007)Google Scholar
  12. 12.
    Türk, O., Schröder, M.: Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques. IEEE Trans. Audio, Speech, Lang. Proc. 18(5), 965–973 (2010)CrossRefGoogle Scholar
  13. 13.
    Székely, E., Cabral, J.P., Cahill, P., Carson-Berndsen, J.: Clustering expressive speech styles in audiobooks using glottal source parameters. In: Proceedings of Interspeech, Florence, Italy (2011)Google Scholar
  14. 14.
    Yamagishi, J., Kobayashi, T.: Adaptive training for hidden semi-Markov model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), Philadelphia, USA (2005)Google Scholar
  15. 15.
    Tokuda, K., Zen, H., Yamagishi, J., Black, A., Masuko, T., Sako, S.: The HMM-based speech synthesis system (HTS), version 2.1 (2009).
  16. 16.
    Kominek, J., Black, A.: The CMU arctic speech databases. In: Proceedings of 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, USA (2004)Google Scholar
  17. 17.
    Clark, R., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Commun. 49, 317–330 (2007)CrossRefGoogle Scholar
  18. 18.
    Kawahara, H., Masuda-Katsuse, I., Cheveigné, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27, 187–207 (1999)CrossRefGoogle Scholar
  19. 19.
    Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn W.B., Paliwal K.K. (eds.) Speech Coding and Synthesis, pp. 495–518. Elsevier Science, New York (1995)Google Scholar
  20. 20.
    Schröder, M., Trouvain, J.: The German text-to-speech synthesis system Mary: a tool for research, development and teaching. Int. J. Speech Technol. 6, 365–377 (2003)CrossRefGoogle Scholar
  21. 21.
    Steinberger, J., Ježek, K.: Evaluation measures for text summarization. Comput. Inf. 28(2), 1001–1026 (2012)Google Scholar
  22. 22.
    Lin, C., Rey, M.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, Barcelona, Spain (2004)Google Scholar
  23. 23.
    Augat, M., Ladlow, M.: An NLTK Package for Lexical-Chain Based Word Sense Disambiguation (2009)Google Scholar
  24. 24.
    Tofiloski, M., Julian, B., Maite, T.: A syntactic and lexical-based discourse segmenter. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009) - Short Papers. Association for Computational Linguistics, (2009)Google Scholar
  25. 25.
    Zen, H., Toda, T.: An overview of Nitech HMM-based speech synthesis system for blizzard challenge 2005. In: Blizzard Challenge Workshop, Lisbon, Portugal (2005)Google Scholar
  26. 26.
    King, S., Karaiskos, V.: The Blizzard Challenge 2013. In: Blizzard Challenge Workshop. Barcelona, Spain (2013)Google Scholar
  27. 27.
    Schröder, M., Pammi, S., Türk, O.: Multilingual MARY TTS participation in the Blizzard Challenge 2009. In: Blizzard Challenge Workshop, Edinburgh, UK (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Séamus Lawless
    • 1
    Email author
  • Peter Lavin
    • 1
  • Mostafa Bayomi
    • 1
  • João P. Cabral
    • 1
  • M. Rami Ghorab
    • 1
  1. 1.CNGL Centre for Global Intelligent Content, Knowledge and Data Engineering Group, School of Computer Science and StatisticsTrinity College DublinDublinIreland

Personalised recommendations