Acoustic Modeling of Dialogue Elements for Document Accessibility

Stavropoulou, Pepi; Spiliotopoulos, Dimitris; Kouroupetroglou, Georgios

doi:10.1007/978-3-642-21657-2_19

Pepi Stavropoulou^17,18,
Dimitris Spiliotopoulos¹⁷ &
Georgios Kouroupetroglou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6768))

Included in the following conference series:

International Conference on Universal Access in Human-Computer Interaction

1816 Accesses

Abstract

Document-to-Audio accessibility assumes that all meaningful presentaion elements in the document, such as bold, italics, tables or bullets, should be properly processed and acoustically modeled, in order to convey the intended meaning to the listeners in a complete and adequate manner. Similarly, several types of documents may contain reported speech and dialogue content signaled through punctuation and other visual elements that require further processing before being rendered to speech. This paper explores such dialogue elements in documents, examines their actual indicators and their use, and investigates the most prominent methods for their acoustic modeling, namely the use of prosody manipulation and voice alternation. It further reports on a pilot experiment on the appropriateness of voice alternation as means for the effective spoken rendition of dialogue elements in documents. Results demonstrate a clear listener preference for the “multiple voice” renditions over the ones using a single voice.

Download to read the full chapter text

Chapter PDF

Communicating Text Structure to Blind People with Text-to-Speech

In-Document Adaptation for a Human Guided Automatic Transcription Service

Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System

Keywords

References

Den Ouden, H., Noordman, L., Terken, J.: The prosodic realization of organizational features of texts. In: Proc. Speech Prosody 2002, pp. 543–546 (2002)
Google Scholar
Chen, H.-H., Tsai, S.-C., Tsai, J.-H.: Mining tables from large scale html texts. In: Proceedings of the 18th International Conference on Computational Linguistics, Saarbrucken, Germany (2000)
Google Scholar
Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. Int. J. Document Analysis 8(2-3), 66–86 (2006)
Article Google Scholar
Filepp, R., Challenger, J., Rosu, D.: Improving the Accessibility of Aurally Rendered HTML Tables. In: Proc. ACM Conf. on Assistive Technologies (ASSETS), pp. 9–16 (2002)
Google Scholar
Fröhlich, P.: Increasing Interaction Robustness of Speech-enabled Mobile Applications by Enhancing Speech Output with Non-speech Sound. In: Proc. ROBUST 2004, COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, Norwich, England (August 2004)
Google Scholar
Goffman, E.: Forms of talk. Basil Blackwell, Oxford (1981)
Google Scholar
Grosz, B., Hirschberg, J.: Some intonational characteristics of discourse structure. In: Proceedings of the 2nd International Conference on Spoken Language Processing, Banff, Canada, pp. 429–432 (1992)
Google Scholar
Haberland, H.: Reported Speech in Danish. In: Coulmas, F. (ed.) Direct and Indirect Speech. Trends in Linguistics, Studies and Monographs, vol. 31. Mouton de Gruyter, Berlin (1986)
Google Scholar
Halliday, M.A.K.: Spoken and written language. Deakin University Press, Geelong (1985)
Google Scholar
Herman, R.: Intonation and discourse structure in English: Phonological and phonetic markers of local and global discourse structure. PhD Thesis (1998)
Google Scholar
Hurst, M., Douglas, S.: Layout & Language: Preliminary Experiments in Assigning Logical Structure to Table Cells. In: Proc. 4th Int. Conf. Document Analysis and Recognition (ICDAR), pp. 1043–1047 (1997)
Google Scholar
Jansen, W., Gregory, M.L., Brenier, J.M.: Prosodic correlates of directly reported speech: Evidence from conversational speech. In: Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Banks, NJ, pp. 77–80 (2001)
Google Scholar
Klewitz, G., Couper-Kuhlen, E.: Quote-unquote? The role of prosody in the contextualization of reported speech sequences. Pragmatics 9(4), 459–485 (1999)
Article Google Scholar
Lehiste, I.: Some Phonetic Characteristics of Discourse. Studia Linguistica 36(2), 117–130 (1982)
Article Google Scholar
Lim, S., Ng, Y.: An Automated Approach for Retrieving Hierarchical Data from HTML Tables. In: Proc. 8th ACM Int. Conf. Information and Knowledge Management (CIKM), pp. 466–474 (1999)
Google Scholar
Nakatani, C., Hirschberg, J., Grosz, B.: Discourse Structure in Spoken Language. Studies on Speech Corpora (1995)
Google Scholar
Oliveira, M., Cunha, D.A.C.: Prosody as Marker of Direct Reported Speech Boundary. In: Speech Prosody 2004, Nara, Japan (March 23-26, 2004)
Google Scholar
Oogane, T., Asakawa, C.: An Interactive Method for Accessing Tables in HTML. In: Proc. Intl. ACM Conf. on Assistive Technologies, pp. 126–128 (1998)
Google Scholar
Pitt, I., Edwards, A.: An Improved Auditory Interface for the Exploration of Lists. ACM Multimedia 1997, 51–61 (1997)
Article Google Scholar
Pontelli, E., Gillan, D., Xiong, W., Saad, E., Gupta, G., Karshmer, A.: Navigation of HTML Tables, Frames, and XML Fragments. In: Proc. ACM Conf. on Assistive Technologies (ASSETS), pp. 25–32 (2002)
Google Scholar
Sinclair, J.: Collins Cobuild English Grammar. Harper Collins, London (2002)
Google Scholar
Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G.: Diction Based Prosody Modeling in Table-to-Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 294–301. Springer, Heidelberg (2005)
Chapter Google Scholar
Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G., Argyropoulos, V., Ikospentaki, K.: Auditory Universal Accessibility of Data Tables using Naturally Derived Prosody Specification. Univ. Access Inf. Soc. 9(2), 169–183 (2010)
Article Google Scholar
Spiliotopoulos, D., Stavropoulou, P., Kouroupetroglou, G.: Acoustic Rendering of Data Tables using Earcons and Prosody for Document Accessibility. In: Stephanidis, C. (ed.) UAHCI 2009. LNCS, vol. 5616, pp. 587–596. Springer, Heidelberg (2009)
Chapter Google Scholar
Stephanidis, C., Akoumianakis, D., Sfyrakis, M., Paramythis, A.: Universal accessibility in HCI: Process-oriented design guidelines and tool requirements. In: Stephanidis, C., Waern, A. (eds.) Proceedings of the 4th ERCIM Workshop on User Interfaces for All, Stockholm, Sweden, October 19-21 (1998)
Google Scholar
Truillet, P., Oriola, B., Nespoulous, J.L., Vigoroux, N.: Effect of Sound Fonts in an Aural Presentation. In: 6th ERCIM Workshop, UI4ALL, pp. 135–144 (2000)
Google Scholar
Wightman, C., Shattuck-Hufnagel, S., Ostendorf, M., Price, P.: Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91(3), 1707–1717 (1992)
Article Google Scholar
Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering Tables in Audio: The Interaction of Structure and Reading Styles. In: Proc. ACM Conf. Assistive Technologies (ASSETS), pp. 16–23 (2004)
Google Scholar
Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents. In: Proc. Human-Computer Interaction - HCII 2005 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Panepistimiopolis, Ilisia, GR-15784, Athens, Greece
Pepi Stavropoulou, Dimitris Spiliotopoulos & Georgios Kouroupetroglou
Department of Linguistics, University of Ioannina, GR-45110, Ioannina, Greece
Pepi Stavropoulou

Authors

Pepi Stavropoulou
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Spiliotopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Kouroupetroglou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Foundation for Research and Technology - Hellas, N. Plastira 100, Vassilika Vouton, 70013, Heraklion, Crete, Greece
Constantine Stephanidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stavropoulou, P., Spiliotopoulos, D., Kouroupetroglou, G. (2011). Acoustic Modeling of Dialogue Elements for Document Accessibility. In: Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Applications and Services. UAHCI 2011. Lecture Notes in Computer Science, vol 6768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21657-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-642-21657-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21656-5
Online ISBN: 978-3-642-21657-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics