Acoustic Modeling of Dialogue Elements for Document Accessibility

  • Pepi Stavropoulou
  • Dimitris Spiliotopoulos
  • Georgios Kouroupetroglou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6768)


Document-to-Audio accessibility assumes that all meaningful presentaion elements in the document, such as bold, italics, tables or bullets, should be properly processed and acoustically modeled, in order to convey the intended meaning to the listeners in a complete and adequate manner. Similarly, several types of documents may contain reported speech and dialogue content signaled through punctuation and other visual elements that require further processing before being rendered to speech. This paper explores such dialogue elements in documents, examines their actual indicators and their use, and investigates the most prominent methods for their acoustic modeling, namely the use of prosody manipulation and voice alternation. It further reports on a pilot experiment on the appropriateness of voice alternation as means for the effective spoken rendition of dialogue elements in documents. Results demonstrate a clear listener preference for the “multiple voice” renditions over the ones using a single voice.


Acoustic modeling document accessibility dialogue reported speech Text to Speech synthesis voice alternation Document-to-Audio 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Den Ouden, H., Noordman, L., Terken, J.: The prosodic realization of organizational features of texts. In: Proc. Speech Prosody 2002, pp. 543–546 (2002)Google Scholar
  2. 2.
    Chen, H.-H., Tsai, S.-C., Tsai, J.-H.: Mining tables from large scale html texts. In: Proceedings of the 18th International Conference on Computational Linguistics, Saarbrucken, Germany (2000)Google Scholar
  3. 3.
    Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Document Analysis 8(2-3), 66–86 (2006)CrossRefGoogle Scholar
  4. 4.
    Filepp, R., Challenger, J., Rosu, D.: Improving the Accessibility of Aurally Rendered HTML Tables. In: Proc. ACM Conf. on Assistive Technologies (ASSETS), pp. 9–16 (2002)Google Scholar
  5. 5.
    Fröhlich, P.: Increasing Interaction Robustness of Speech-enabled Mobile Applications by Enhancing Speech Output with Non-speech Sound. In: Proc. ROBUST 2004, COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, Norwich, England (August 2004)Google Scholar
  6. 6.
    Goffman, E.: Forms of talk. Basil Blackwell, Oxford (1981)Google Scholar
  7. 7.
    Grosz, B., Hirschberg, J.: Some intonational characteristics of discourse structure. In: Proceedings of the 2nd International Conference on Spoken Language Processing, Banff, Canada, pp. 429–432 (1992)Google Scholar
  8. 8.
    Haberland, H.: Reported Speech in Danish. In: Coulmas, F. (ed.) Direct and Indirect Speech. Trends in Linguistics, Studies and Monographs, vol. 31. Mouton de Gruyter, Berlin (1986)Google Scholar
  9. 9.
    Halliday, M.A.K.: Spoken and written language. Deakin University Press, Geelong (1985)Google Scholar
  10. 10.
    Herman, R.: Intonation and discourse structure in English: Phonological and phonetic markers of local and global discourse structure. PhD Thesis (1998)Google Scholar
  11. 11.
    Hurst, M., Douglas, S.: Layout & Language: Preliminary Experiments in Assigning Logical Structure to Table Cells. In: Proc. 4th Int. Conf. Document Analysis and Recognition (ICDAR), pp. 1043–1047 (1997)Google Scholar
  12. 12.
    Jansen, W., Gregory, M.L., Brenier, J.M.: Prosodic correlates of directly reported speech: Evidence from conversational speech. In: Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Banks, NJ, pp. 77–80 (2001)Google Scholar
  13. 13.
    Klewitz, G., Couper-Kuhlen, E.: Quote-unquote? The role of prosody in the contextualization of reported speech sequences. Pragmatics 9(4), 459–485 (1999)CrossRefGoogle Scholar
  14. 14.
    Lehiste, I.: Some Phonetic Characteristics of Discourse. Studia Linguistica 36(2), 117–130 (1982)CrossRefGoogle Scholar
  15. 15.
    Lim, S., Ng, Y.: An Automated Approach for Retrieving Hierarchical Data from HTML Tables. In: Proc. 8th ACM Int. Conf. Information and Knowledge Management (CIKM), pp. 466–474 (1999)Google Scholar
  16. 16.
    Nakatani, C., Hirschberg, J., Grosz, B.: Discourse Structure in Spoken Language. Studies on Speech Corpora (1995)Google Scholar
  17. 17.
    Oliveira, M., Cunha, D.A.C.: Prosody as Marker of Direct Reported Speech Boundary. In: Speech Prosody 2004, Nara, Japan (March 23-26, 2004)Google Scholar
  18. 18.
    Oogane, T., Asakawa, C.: An Interactive Method for Accessing Tables in HTML. In: Proc. Intl. ACM Conf. on Assistive Technologies, pp. 126–128 (1998)Google Scholar
  19. 19.
    Pitt, I., Edwards, A.: An Improved Auditory Interface for the Exploration of Lists. ACM Multimedia 1997, 51–61 (1997)Google Scholar
  20. 20.
    Pontelli, E., Gillan, D., Xiong, W., Saad, E., Gupta, G., Karshmer, A.: Navigation of HTML Tables, Frames, and XML Fragments. In: Proc. ACM Conf. on Assistive Technologies (ASSETS), pp. 25–32 (2002)Google Scholar
  21. 21.
    Sinclair, J.: Collins Cobuild English Grammar. Harper Collins, London (2002)Google Scholar
  22. 22.
    Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G.: Diction Based Prosody Modeling in Table-to-Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 294–301. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  23. 23.
    Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G., Argyropoulos, V., Ikospentaki, K.: Auditory Universal Accessibility of Data Tables using Naturally Derived Prosody Specification. Univ. Access Inf. Soc. 9(2), 169–183 (2010)CrossRefGoogle Scholar
  24. 24.
    Spiliotopoulos, D., Stavropoulou, P., Kouroupetroglou, G.: Acoustic Rendering of Data Tables using Earcons and Prosody for Document Accessibility. In: Stephanidis, C. (ed.) UAHCI 2009. LNCS, vol. 5616, pp. 587–596. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    Stephanidis, C., Akoumianakis, D., Sfyrakis, M., Paramythis, A.: Universal accessibility in HCI: Process-oriented design guidelines and tool requirements. In: Stephanidis, C., Waern, A. (eds.) Proceedings of the 4th ERCIM Workshop on User Interfaces for All, Stockholm, Sweden, October 19-21 (1998)Google Scholar
  26. 26.
    Truillet, P., Oriola, B., Nespoulous, J.L., Vigoroux, N.: Effect of Sound Fonts in an Aural Presentation. In: 6th ERCIM Workshop, UI4ALL, pp. 135–144 (2000)Google Scholar
  27. 27.
    Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M., Price, P.: Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91(3), 1707–1717 (1992)CrossRefGoogle Scholar
  28. 28.
    Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering Tables in Audio: The Interaction of Structure and Reading Styles. In: Proc. ACM Conf. Assistive Technologies (ASSETS), pp. 16–23 (2004)Google Scholar
  29. 29.
    Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents. In: Proc. Human-Computer Interaction - HCII 2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Pepi Stavropoulou
    • 1
    • 2
  • Dimitris Spiliotopoulos
    • 1
  • Georgios Kouroupetroglou
    • 1
  1. 1.Department of Informatics and TelecommunicationsNational and Kapodistrian University of AthensAthensGreece
  2. 2.Department of LinguisticsUniversity of IoanninaIoanninaGreece

Personalised recommendations