Methods for Human-Centered Evaluation of MediaSync in Real-Time Communication
Abstract
In an ideal world people interacting using real-time multimedia links experience perfectly synchronized media, and there is no latency of transmission: the interlocutors would hear and see each other with no delay. Methods to achieve the former are discussed in other chapters in this book, but for a variety of practical and physical reasons, delay-free communication will never be possible. In some cases, the delay will be very obvious since it will be possible to observe the reaction time of the listeners modified by the delay, or there may be some acoustic echo from the listeners’ audio equipment. However, in the absence of echo, the users themselves do not always explicitly notice the presence of delay, even for quite large values. Typically, they notice something is wrong (for example “we kept interrupting each other!”), but are unable to define what it is. Some useful insights into the impact of delay on a conversation can be obtained from the linguistic discipline of Conversation Analysis, and especially the analysis of “turn-taking” in a conversation. This chapter gives an overview of the challenges in evaluating media synchronicity in real-time communications, outlining appropriate tasks and methods for subjective testing and how in-depth analysis of such tests can be performed to gain a deep understanding of the effects of delay. The insights are based on recent studies of audio and audiovisual communication, but also show examples from other media synchronization applications like networked music interaction.
Keywords
Transmission delay Latency Subjective evaluation Test method Conversation analysisNotes
Definitions
A meeting in which participants are located at least two locations and the communication takes place via a telecommunication system. The term telemeeting is used to emphasize that a meeting is often more flexible and interactive than a conventional business teleconference and could also be a private meeting. The telemeeting could be audio-only, audiovisual, text-based, or a mix of these modes.
The perceived quality when two or more test participants have a conversation.
The study of social interaction, in particular focusing on conversations.
A conversation analysis term describing the fundamental segment of speech in a conversation – essentially a piece of speech that constitutes an entire ‘turn’.
A conversation analysis term which indicates where a turn or floor exchange can take place between speakers.
References
- 1.Barger, R., Church, S., Fukuda, A., Grunke, J., Keislar, D., Moses, B., Novak, B., Pennycook, B., Settel, Z., Strawn, J., Wiser, P., Woszczyk, W.: AES white paper: networking audio and music using internet2 and next generation internet capabilities. Audio Engineering Society, New York (1998)Google Scholar
- 2.Bech, S., Zacharov, N.: Perceptual Audio Evaluation-Theory, Method and Application. WileyGoogle Scholar
- 3.Berndtsson, G., Folkesson, M., Kulyk, V.: Subjective quality assessment of video conferences and telemeetings. Packet video workshop 2012, München, Germany (2012)Google Scholar
- 4.Berry, A.: Spanish and American Turn-Taking Styles: a Comparative Study (1994)Google Scholar
- 5.Biech, E.: The Pfeiffer book of successful team-building tools: best of the annuals. Pfeiffer (2007)Google Scholar
- 6.Brady, P.T.: A technique for investigating on-off patterns of speech. Bell Syst. Tech. J. 44(1), 1–22 (1965)CrossRefGoogle Scholar
- 7.Brady, P.T.: A statistical analysis of on-off patterns in 16 conversations. Bell Syst. Tech. J. 47(1), 73–99 (1968)CrossRefGoogle Scholar
- 8.Brady, P.T.: Effects of transmission delay on conversational behavior on echo-free telephone circuits. Bell Syst. Tech. J. 50(1), 115–134 (1971)CrossRefGoogle Scholar
- 9.Braun, A.M.: Qualitätsaspekte multimodaler Kommunikation: Subjektive und objektive Messungen [engl.: Qualityaspects of multimodal communication: Subjective and objective measurements]. Ph.D. thesis, Eidgenössische Technische Hochschule Zürich (2003)Google Scholar
- 10.Chafe, C., Wilson, S., Leistikow, R. Chisholm, D., Scavone, G.: A simplified approach to high quality music and sound over IP. In: Proceedings of the International Conference on Digital Audio Effects, Verona, Italy (2000)Google Scholar
- 11.Chafe, C., Gurevich, M.: Network Time Delay and Ensemble Accuracy: Effects of Latency, Asymmetry. J. Audio Eng. Soc. (2004)Google Scholar
- 12.Clift, R.: Conversation Analysis, Cambridge Textbooks in Linguistics (2016)Google Scholar
- 13.Cragan JF, Wright DW (1991) Communication in Small Group Discussions: An Integrated Approach. West Publishing CompanyGoogle Scholar
- 14.Daly-Jones, O., Monk, A., Watts, L.: Some advantages of video conferencing over high-quality audio conferencing: fluency and awareness of attentional focus. Int. J. Hum Comput Stud. 49, 21–58 (1998)CrossRefGoogle Scholar
- 15.Duffy, S.: Closer—A study of one-to-one instrumental music tuition through video conference. Media and Arts Technology Programme Project Report. Queen Mary, University of London (2011)Google Scholar
- 16.Egger, S., Schatz, R., Scherer, S.: It takes two to tango—assessing the impact of delay on conversational interactivity on perceived speech quality. In: Annual Conference of the Speech Communication Association, pp. 1321–1324 (2010)Google Scholar
- 17.Egger, S., Schatz, R., Schoenenberg, K., Raake, A., Kubin, G.: Same but different?—Using speech signal features for comparing conversational VoIP quality studies. In: International Conference on Communications, IEEE, pp. 1320–1324 (2012)Google Scholar
- 18.Farner, S., Solvang, A., Sæbø, A., Svensson, U.P.: Ensemble hand-clapping experiments under the influence of delay and various acoustic environments. J. Audio Eng. Soc. 57, 1028–1041 (2009)Google Scholar
- 19.Geelhoed, E., Parker, A., Williams, D.J., Groen, M.: Effects of latency on telepresence. Technical report, HPL-2009-120, HP Laboratories (2009)Google Scholar
- 20.Guéguin, M., Bouquin-Jeans, R.L., Gautier-Turbin, V., Faucon, G., Barriac, V.: On the evaluation of the conversational speech quality in telecommunications. EURASIP J. Adv. Signal Process. (2008)Google Scholar
- 21.Hammer, F., Reichl, P., Raake, A.: Elements of interactivity in telephone conversations. In: International Conference Spoken Language, pp. 1741–1744 (2004)Google Scholar
- 22.Hammer, F.: Quality Aspects of Packet-Based Interactive Speech Communication, Ph.D. thesis, Technical University Graz, Vienna (2006)Google Scholar
- 23.Huynh-Thu, Q., Garcia, M-N., Speranza, F., Corriveau, P., Raake, A.: Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans. Broadcast. 57, 1–14 (2011). https://doi.org/10.1109/tbc.2010.2086750
- 24.ITU-T: Handbook on Telephonometry. International Telecommunication Union, Geneva (1993)Google Scholar
- 25.ITU-T Recommendation G.107: The E-model: a computational model for use in transmission planning. International Telecommunication Union, Geneva (2015)Google Scholar
- 26.ITU-T Recommendation G.114: One-way transmission time. International Telecommunication Union, Geneva (2003)Google Scholar
- 27.ITU-T Recommendation P.800: Methods for subjective determination of transmission quality. International Telecommunication Union, Geneva (1996)Google Scholar
- 28.ITU-T Recommendation P.805: Subjective evaluation of conversational quality. International Telecommunication Union, Geneva (2007)Google Scholar
- 29.ITU-T Recommendation P.831: Subjective performance evaluation of network echo cancellers. International Telecommunication Union, Geneva (1998)Google Scholar
- 30.ITU-T Recommendation P.832: Subjective performance evaluation of hands-free terminals. International Telecommunication Union, Geneva (2000)Google Scholar
- 31.ITU-T Recommendation P.920: Interactive test methods for audiovisual communications. International Telecommunication Union, Geneva (2000)Google Scholar
- 32.ITU-T Recommendation P.1301: Subjective quality evaluation of audio and audiovisual multiparty telemeetings. International Telecommunication Union, Geneva (2012)Google Scholar
- 33.ITU-T Recommendation P.1305: Effects of delays on the telemeeting quality. International Telecommunication Union, Geneva (2016)Google Scholar
- 34.ITU-T Recommendation P.1312: Method for the measurement of the communication effectiveness of multiparty telemeetings using task performance. International Telecommunication Union, Geneva (2015)Google Scholar
- 35.ITU-R Recommendation BT.1359-1: Relative timing of sound and vision for broadcasting. The ITU Radiocommunication Assembly, Geneva (1998)Google Scholar
- 36.International Telecommunications Union. https://www.itu.int/en/ITU-T/studygroups/2017-2020/12/Pages/default.aspx
- 37.Jekosch, U.: Voice and Speech Quality Perception—Assessment and Evaluation. Springer (2005)Google Scholar
- 38.Kapur, A., Wang, G., Davidson, P., Cook, P.R.: Interactive network performance: a dream worth dreaming? In: Organised Sound, vol. 10 (3), pp. 209–219. Cambridge University Press (2005)Google Scholar
- 39.Kitawaki, N., Itoh, K.: Pure delay effects on speech quality in telecommunications. IEEE J. Sel. Areas Commun. 9(4), 586–593 (1991)Google Scholar
- 40.Knapp, M.L., Hall, J.A.: Nonverbal Communication in Human Interaction, 7th edn. Wadsworth, Cengage Learning, Boston, USA (2010)Google Scholar
- 41.Krauss, R.M., Bricker, P.E.: Effects of transmission delay and access delay on the efficiency of verbal communication. J. Acoust. Soc. Am. 41(2), 286–292 (1967)CrossRefGoogle Scholar
- 42.Krauss, R.M., Garlock, C., Bricker, P.D., McMahon, L.: The role of audible and visible back-channel responses in interpersonal communication. Journal of Personality and Social Psychology 35(7), 523–529 (1977)CrossRefGoogle Scholar
- 43.Le Callet, P., Perkis, A., Möller, S. (eds.): Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003). Version 1.2.” Mar-2013Google Scholar
- 44.Matejka, J., Glueck, M., Grossman, T., Fitzmaurice, G.: The effect of visual appearance on the performance of continuous sliders and visual analogue scales. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, pp. 5421–5432 (2016)Google Scholar
- 45.Möller, S.: Assessment and Prediction of Speech Quality in Telecommunications. Springer (2000)Google Scholar
- 46.O’Conaill, B., Whittaker, S., Wilbur, S.: Conversations over videoconferences: an evaluation of the spoken aspects of video mediated interaction. Hum. Comput. Interact. 8(4), 389–428 (1993). Olson JS, Olson GM, MeaderGoogle Scholar
- 47.Raake, A.: Speech Quality of VOIP—Assessment and Prediction. Wiley (2006)Google Scholar
- 48.Raake, A., Hoeldtke, K., Schlegel, C., Ahrens, J., Geier, M.: Listening and conversational quality of spatial audio conferencing. In: International Conference of the Audio Engineering Society, pp. 4–7 (2010)Google Scholar
- 49.Raake, A., Schoenenberg, K., Skowronek, J., Egger, S.: Predicting speech quality based on interactivity and delay. In: Proceedings of 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), Lyon, France (2013)Google Scholar
- 50.Richards, D.L.: Conversational performance of speech links subject to long propagation times. In: International Conference on Satellite Communication, IEEE, pp. 955–963 (1962)Google Scholar
- 51.Riesz, R.R., Klemmer, E.T.: Subjective evaluation of delay and echo suppressors in telephone communications. Bell Syst. Tech. J. 42(6), 2919–2941 (1963)CrossRefGoogle Scholar
- 52.Ruhleder, K., Jordan, B.: Co-constructing non-mutual realities: delay-generated trouble in distributed interaction. Comput. Support. Coop. Work 10(1), 113–138 (2001)CrossRefGoogle Scholar
- 53.Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organisation of turn-taking in conversation. Language 50, pp. 696–735 (1974)Google Scholar
- 54.Sat, B., Huang, Z., Wah, B.W.: The design of a multi-party VoIP conferencing system over the internet. In: International Symposium on Multimedia, IEEE, pp. 3–10 (2007)Google Scholar
- 55.Schmitt, M., Gunkel, S., Cesar, P., Hughes, P.: A QoE testbed for socially-aware video-mediated group communication. In: Proceedings of the 2nd International Workshop on Socially-aware Multimedia, New York, NY, USA, pp. 37–42 (2013)Google Scholar
- 56.Schmitt, M., Gunkel, S., Cesar, P., Bulterman, D.: The influence of interactivity patterns on the Quality of Experience in multi-party video-mediated conversations under symmetric delay conditions. In: Proceedings of the 3rd International Workshop on Socially-aware Multimedia, New York, NY, USA (2014)Google Scholar
- 57.Schoenenberg, K., Raake, A., Egger, S., Schatz, R.: On interaction behaviour in telephone conversations under transmission delay. Speech Commun. 63–64, 1–14 (2014)CrossRefGoogle Scholar
- 58.Schoenenberg, K., Schmieder, M.: 3rnt—3-party random number verification (timed version) task (2014). https://doi.org/10.5281/zenodo.16133
- 59.Schoenenberg, K., Raake, A., Koeppe, J.: Why are you so slow?—Misattribution of transmission delay to attributes of the conversation partner at the far-end. Int. J. Hum.-Comput. Stud. 72(5), 477–487, May 2014Google Scholar
- 60.Schoenenberg, K.: The Quality of Mediated-Conversations under Transmission Delay, Ph.D. Thesis, Technical University of Berlin (2016)Google Scholar
- 61.Skowronek, J., Schiffner, F., Schoenenberg, K., Raake, A.: 3sct 3-party short conversation test scenarios for conferencing assessment (version 03) (2014). https://doi.org/10.5281/zenodo.16136
- 62.Skowronek, J.: Quality of Experience of Multiparty Conferencing and Telemeeting Systems Methods and Models for Assessment and Prediction, Ph.D. thesis (2017). https://dx.doi.org/10.14279/depositonce-5811
- 63.
- 64.Tam, J., Carter, E., Kiesler, S., Hodgins, J.: Video increases the perception of naturalness during remote interactions with latency. In: Proceedings of CHI’12, New York, NY, USA, pp. 2045–2050 (2012)Google Scholar
- 65.Vartabedian, A.G.: The effects of transmission delay in four-wire teleconferencing. Bell Syst. Tech. J. 45(10), 1673–1688 (1966)CrossRefGoogle Scholar
- 66.Wah, B.W., Sat, B.: The design of VoIP systems with high perceptual conversational quality. J. Multimed. 4(2), 49–62 (2009)CrossRefGoogle Scholar
- 67.Wang, J., Yang, F., Xie, Z., Wan, S.: Evaluation on perceptual audiovisual delay using average talkspurts and delay. In: 2010 3rd International Congress on Image and Signal Processing (CISP), vol. 1, pp. 125–128 (2010)Google Scholar