Abstract
Document-to-Audio accessibility assumes that all meaningful presentaion elements in the document, such as bold, italics, tables or bullets, should be properly processed and acoustically modeled, in order to convey the intended meaning to the listeners in a complete and adequate manner. Similarly, several types of documents may contain reported speech and dialogue content signaled through punctuation and other visual elements that require further processing before being rendered to speech. This paper explores such dialogue elements in documents, examines their actual indicators and their use, and investigates the most prominent methods for their acoustic modeling, namely the use of prosody manipulation and voice alternation. It further reports on a pilot experiment on the appropriateness of voice alternation as means for the effective spoken rendition of dialogue elements in documents. Results demonstrate a clear listener preference for the “multiple voice” renditions over the ones using a single voice.
Chapter PDF
Similar content being viewed by others
Keywords
References
Den Ouden, H., Noordman, L., Terken, J.: The prosodic realization of organizational features of texts. In: Proc. Speech Prosody 2002, pp. 543–546 (2002)
Chen, H.-H., Tsai, S.-C., Tsai, J.-H.: Mining tables from large scale html texts. In: Proceedings of the 18th International Conference on Computational Linguistics, Saarbrucken, Germany (2000)
Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Document Analysis 8(2-3), 66–86 (2006)
Filepp, R., Challenger, J., Rosu, D.: Improving the Accessibility of Aurally Rendered HTML Tables. In: Proc. ACM Conf. on Assistive Technologies (ASSETS), pp. 9–16 (2002)
Fröhlich, P.: Increasing Interaction Robustness of Speech-enabled Mobile Applications by Enhancing Speech Output with Non-speech Sound. In: Proc. ROBUST 2004, COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, Norwich, England (August 2004)
Goffman, E.: Forms of talk. Basil Blackwell, Oxford (1981)
Grosz, B., Hirschberg, J.: Some intonational characteristics of discourse structure. In: Proceedings of the 2nd International Conference on Spoken Language Processing, Banff, Canada, pp. 429–432 (1992)
Haberland, H.: Reported Speech in Danish. In: Coulmas, F. (ed.) Direct and Indirect Speech. Trends in Linguistics, Studies and Monographs, vol. 31. Mouton de Gruyter, Berlin (1986)
Halliday, M.A.K.: Spoken and written language. Deakin University Press, Geelong (1985)
Herman, R.: Intonation and discourse structure in English: Phonological and phonetic markers of local and global discourse structure. PhD Thesis (1998)
Hurst, M., Douglas, S.: Layout & Language: Preliminary Experiments in Assigning Logical Structure to Table Cells. In: Proc. 4th Int. Conf. Document Analysis and Recognition (ICDAR), pp. 1043–1047 (1997)
Jansen, W., Gregory, M.L., Brenier, J.M.: Prosodic correlates of directly reported speech: Evidence from conversational speech. In: Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Banks, NJ, pp. 77–80 (2001)
Klewitz, G., Couper-Kuhlen, E.: Quote-unquote? The role of prosody in the contextualization of reported speech sequences. Pragmatics 9(4), 459–485 (1999)
Lehiste, I.: Some Phonetic Characteristics of Discourse. Studia Linguistica 36(2), 117–130 (1982)
Lim, S., Ng, Y.: An Automated Approach for Retrieving Hierarchical Data from HTML Tables. In: Proc. 8th ACM Int. Conf. Information and Knowledge Management (CIKM), pp. 466–474 (1999)
Nakatani, C., Hirschberg, J., Grosz, B.: Discourse Structure in Spoken Language. Studies on Speech Corpora (1995)
Oliveira, M., Cunha, D.A.C.: Prosody as Marker of Direct Reported Speech Boundary. In: Speech Prosody 2004, Nara, Japan (March 23-26, 2004)
Oogane, T., Asakawa, C.: An Interactive Method for Accessing Tables in HTML. In: Proc. Intl. ACM Conf. on Assistive Technologies, pp. 126–128 (1998)
Pitt, I., Edwards, A.: An Improved Auditory Interface for the Exploration of Lists. ACM Multimedia 1997, 51–61 (1997)
Pontelli, E., Gillan, D., Xiong, W., Saad, E., Gupta, G., Karshmer, A.: Navigation of HTML Tables, Frames, and XML Fragments. In: Proc. ACM Conf. on Assistive Technologies (ASSETS), pp. 25–32 (2002)
Sinclair, J.: Collins Cobuild English Grammar. Harper Collins, London (2002)
Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G.: Diction Based Prosody Modeling in Table-to-Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 294–301. Springer, Heidelberg (2005)
Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G., Argyropoulos, V., Ikospentaki, K.: Auditory Universal Accessibility of Data Tables using Naturally Derived Prosody Specification. Univ. Access Inf. Soc. 9(2), 169–183 (2010)
Spiliotopoulos, D., Stavropoulou, P., Kouroupetroglou, G.: Acoustic Rendering of Data Tables using Earcons and Prosody for Document Accessibility. In: Stephanidis, C. (ed.) UAHCI 2009. LNCS, vol. 5616, pp. 587–596. Springer, Heidelberg (2009)
Stephanidis, C., Akoumianakis, D., Sfyrakis, M., Paramythis, A.: Universal accessibility in HCI: Process-oriented design guidelines and tool requirements. In: Stephanidis, C., Waern, A. (eds.) Proceedings of the 4th ERCIM Workshop on User Interfaces for All, Stockholm, Sweden, October 19-21 (1998)
Truillet, P., Oriola, B., Nespoulous, J.L., Vigoroux, N.: Effect of Sound Fonts in an Aural Presentation. In: 6th ERCIM Workshop, UI4ALL, pp. 135–144 (2000)
Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M., Price, P.: Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91(3), 1707–1717 (1992)
Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering Tables in Audio: The Interaction of Structure and Reading Styles. In: Proc. ACM Conf. Assistive Technologies (ASSETS), pp. 16–23 (2004)
Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents. In: Proc. Human-Computer Interaction - HCII 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stavropoulou, P., Spiliotopoulos, D., Kouroupetroglou, G. (2011). Acoustic Modeling of Dialogue Elements for Document Accessibility. In: Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Applications and Services. UAHCI 2011. Lecture Notes in Computer Science, vol 6768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21657-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-21657-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21656-5
Online ISBN: 978-3-642-21657-2
eBook Packages: Computer ScienceComputer Science (R0)