Investigating efficient speech-based information communication: a comparison between the high-rate and the concurrent playback designs

Abu ul Fazal, Muhammad; Ferguson, Sam; Johnston, Andrew

doi:10.1007/s00530-020-00669-2

Investigating efficient speech-based information communication: a comparison between the high-rate and the concurrent playback designs

Regular Paper
Published: 02 July 2020

Volume 26, pages 621–630, (2020)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

134 Accesses
1 Citation
Explore all metrics

Abstract

This research aims to assist users to seek information efficiently while interacting with speech-based information, particularly in multimedia delivery, and reports on an experiment that tested two speech-based designs for communicating multiple speech-based information streams efficiently. In this experiment, a high-rate playback design and a concurrent playback design are investigated. In the high-rate playback design, two speech-based information streams were communicated by doubling the normal playback-rate, and in the concurrent playback design, two speech-based information streams were played concurrently. Comprehension of content in both the designs was also compared with the benchmark set from regular baseline condition. The results showed that the users’ comprehension regarding the main information dropped significantly in the high-rate playback and the concurrent playback designs compared to the baseline condition. However, in answering the questions set from the detailed information, the comprehension was not significantly different in all three designs. It is expected that such equeryfficient communication methods may increase productivity by providing information efficiently while interacting with an interactive multimedia system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple Information Communication in Voice-Based Interaction

Speech Communication

Review of the Opus Codec in a WebRTC Scenario for Audio and Speech Communication

Notes

All the percentages in the subsequent sections text are mentioned in rounded form. However, the the graphs and statistical analysis are drawn on precise values.

References

Audacity: Audacity. Online. https://www.audacityteam.org/. Accessed 23 Dec 2018
Bakker, S., van den Hoven, E., Eggen, B.: Knowing by ear: leveraging human attention abilities in interaction design. J. Multimodal User Interfaces 5(3–4), 197–209 (2012)
Article Google Scholar
Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 25(1), 49–59 (1994)
Article Google Scholar
Brock, D., McClimens, B., Trafton, J.G., McCurry, M., Perzanowski, D.: Evaluating listeners’ attention to and comprehension of spatialized concurrent and serial talkers at normal and a synthetically faster rate of speech. In: Proceedings of the 14th International Conference on Auditory Display (ICAD), pp. 1–8. Georgia Institute of Technology (2008)
Brock, D., Wasylyshyn, C., Mcclimens, B.: Word spotting in a multichannel virtual auditory display at normal and accelerated rates of speech. In: Proceedings of the 22nd International Conference on Auditory Display (ICAD), pp. 130–135. Georgia Institute of Technology (2016)
Brock, D., Wasylyshyn, C., Mcclimens, B., Perzanowski, D.: Facilitating the Watchstander’s voice communications task in future navy operations. In: MILCOM 2011 Military Communications Conference, pp. 2222–2226 (2011). https://doi.org/10.1109/MILCOM.2011.6127692
Brungart, D.S., Simpson, B.D.: Distance-based speech segregation in near-field virtual audio displays. In: Proceedings of the 2001 International Conference on Auditory Display (ICAD), pp. 169–174. Georgia Institute of Technology (2001)
Brungart, D.S., Simpson, B.D.: Optimizing the spatial configuration of a seven-talker speech display. ACM Trans. Appl. Percept. (TAP) 2(4), 430–436 (2005)
Article Google Scholar
Council, B.: IELTS. Online. https://www.ielts.org (2019). Accessed 30 Apr 2019
Fazal, M.A.U., Ferguson, S.: Investigating concurrent information communication in voice-based interaction. https://doi.org/10.6084/m9.figshare.11917857.v1. https://www.bit.ly/39vzSFP (2020). Accessed 29 Feb 2020
Fazal, M.A.U., Ferguson, S., Karim, M.S., Johnston, A.: Concurrent voice-based multiple information communication: a study report of profile-based users’ interaction. In: 145th Convention of the Audio Engineering Society. Audio Engineering Society (2018)
Fazal, M.A.U., Ferguson, S., Karim, M.S., Johnston, A.: Vinfomize: a framework for multiple voice-based information communication. In: Proceedings of the 3rd International Conference on Information System and Data Mining, pp. 1–7. ACM (2019) (accepted—to be published)
Fazal, M.A.U., Shuaib, K.: Multiple information communication in voice-based interaction. In: Advances in Intelligent Systems and Computing, pp. 101–111 (2017)
Feltham, F., Loke, L.: Felt sense through auditory display: a design case study into sound for somatic awareness while walking. In: Proceedings of the 2017 ACM SIGCHI Conference on Creativity and Cognition, pp. 287–298. ACM (2017)
Ferguson, S., Cabrera, D.: Exploratory Sound Analysis: Sonifying Data about Sound, pp. 1–8. International Community for Auditory Display, Palo Alto (2008)
Google Scholar
Guerreiro, J.: Towards screen readers with concurrent speech: where to go next? SIGACCESS Access. Comput. 114, 12–19 (2016)
Article Google Scholar
Guerreiro, J., Goncalves, D.: Scanning for digital content: how blind and sighted people perceive concurrent speech. ACM Trans. Access. Comput. (2016). https://doi.org/10.1109/CVPR.2016.105
Article Google Scholar
Hafen, R.P., Henry, M.J.: Speech information retrieval: a review. Multimed. Syst. 18(6), 499–518 (2012)
Article Google Scholar
Hermann, T.: Taxonomy and definitions for sonification and auditory display. In: Proceedings of the 14th International Conference on Auditory Display (ICAD), pp. 1–8. Georgia Institute of Technology (2008)
Hinde, A.F.: Concurrency in auditory displays for connected television. Ph.D. thesis, University of York (2016)
Ikei, Y., Yamazaki, H., Hirota, K., Hirose, M.: vCocktail: multiplexed-voice menu presentation method for wearable computers. In: Virtual Reality Conference, pp. 183–190. IEEE (2006). https://doi.org/10.1109/VR.2006.141
Iyer, N., Thompson, E.R., Simpson, B.D., Brungart, D., Summers, V.: Exploring auditory gist: comprehension of two dichotic, simultaneously presented stories. In: Proceedings of Meetings on Acoustics, vol. 19, p. 050158. Acoustical Society of America (2013). https://doi.org/10.1121/1.4800507
Kramer, G.: An introduction to auditory display. In: Kramer, G. (ed.) Auditory Display. Addison-Wesley, Reading, MA (1994)
Google Scholar
McGookin, D.K., Brewster, S.A.: An investigation into the identification of concurrently presented earcons. In: Proceedings of 2003 International Conference on Auditory Display (ICAD), pp. 42–46. Georgia Institute of Technology (2003)
McGookin, D.K., Brewster, S.A.: Understanding concurrent earcons: applying auditory scene analysis principles to concurrent earcon recognition. ACM Trans. Appl. Percept. 1(2), 130–155 (2004). https://doi.org/10.1145/1024083.1024087
Article Google Scholar
Moon, C.B., Lee, J.Y., Kim, D.S., Kim, B.M.: Multimedia content recommendation in social networks using mood tags and synonyms. In: Multimedia Systems pp. 1–18 (2019)
Mullins, A.T.: Audiostreamer: leveraging the cocktail party effect for efficient listening. Ph.D. thesis, Massachusetts Institute of Technology (1996)
Obermeyera, J.A., Edmondsa, L.A.: Attentive reading with constrained summarization adapted to address written discourse in people with mild aphasia. Am. J. Speech Lang. Pathol. 27(1S), 392–405 (2018)
Article Google Scholar
Parente, P.: Clique: perceptually based, task oriented auditory display for GUI applications. Ph.D. thesis, The University of North Carolina at Chapel Hill (2008)
Patel, D., Ghosh, D., Zhao, S.: Teach me fast: how to optimize online lecture video speeding for learning in less time? In: Proceedings of the Sixth International Symposium of Chinese CHI, pp. 160–163. ACM (2018)
Roto, V., Law, E., Vermeeren, A., Hoonhout, J.: User experience white paper: Bringing clarity to the concept of user experience. In: Dagstuhl Seminar on Demarcating User Experience, p. 12 (2011)
Schmandt, C., Mullins, A.: AudioStreamer: exploiting simultaneity for listening. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 218–219. ACM (1995). https://doi.org/10.1145/223355.223533
Song, H.J.: Evaluation of the effects of spatial separation and timbral differences on the identifiability of features of concurrent auditory streams. Doctoral dissertation, University of Sydney, Australia. Retrieved from http://hdl.handle.net/2123/7213 (2011)
Test, P.: R Documentation. Online. https://bit.ly/2kkNEH5 (2018). Accessed 01 May 2018
Tordini, F., Bregman, A.S., Cooperstock, J.R.: Prioritizing foreground selection of natural chirp sounds by tempo and spectral centroid. J. Multimodal User Interfaces 10(3), 221–234 (2016)
Article Google Scholar
Towers, J.A.: Enabling the effective application of spatial auditory displays in modern flight decks. Ph.D. thesis, The University of Queensland (2016)
Wasylyshyn, C., McClimens, B., Brock, D.: Comprehension of speech presented at synthetically accelerated rates: evaluating training and practice effects. In: Proceedings of the 16th International Conference on Auditory Display (ICAD), pp. 133–136. Georgia Institute of Technology (2010)
Welland, R.J., Lubinski, R., Higginbotham, D.J.: Discourse comprehension test performance of elders with dementia of the Alzheimer type. J. Speech Lang. Hear. Res. 45(6), 1175 (2002). https://doi.org/10.1044/1092-4388(2002/095)
Article Google Scholar
Werner, S., Hauck, C., Roome, N., Hoover, C., Choates, D.: Can VoiceScapes assist in menu navigation? Proc. Hum. Factors Ergon. Soc. 2015, 1095–1099 (2015). https://doi.org/10.1177/1541931215591157
Article Google Scholar
Wilson, E.B.: Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22(158), 209–212 (1927)
Article Google Scholar

Download references

Acknowledgements

This research was supported by the School of Software, Faculty of Engineering and IT, University of Technology Sydney, Australia.

Author information

Authors and Affiliations

School of Computer Science, Faculty of Engineering and IT, University of Technology, Sydney, NSW, 2007, Australia
Muhammad Abu ul Fazal, Sam Ferguson & Andrew Johnston

Authors

Muhammad Abu ul Fazal
View author publications
You can also search for this author in PubMed Google Scholar
Sam Ferguson
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Johnston
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Abu ul Fazal.

Additional information

Communicated by M. Katsurai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abu ul Fazal, M., Ferguson, S. & Johnston, A. Investigating efficient speech-based information communication: a comparison between the high-rate and the concurrent playback designs. Multimedia Systems 26, 621–630 (2020). https://doi.org/10.1007/s00530-020-00669-2

Download citation

Received: 16 September 2019
Accepted: 20 June 2020
Published: 02 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00530-020-00669-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating efficient speech-based information communication: a comparison between the high-rate and the concurrent playback designs

Abstract

Access this article

Similar content being viewed by others

Multiple Information Communication in Voice-Based Interaction

Speech Communication

Review of the Opus Codec in a WebRTC Scenario for Audio and Speech Communication

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Investigating efficient speech-based information communication: a comparison between the high-rate and the concurrent playback designs

Abstract

Access this article

Similar content being viewed by others

Multiple Information Communication in Voice-Based Interaction

Speech Communication

Review of the Opus Codec in a WebRTC Scenario for Audio and Speech Communication

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation