MediaSync pp 229-270 | Cite as

Methods for Human-Centered Evaluation of MediaSync in Real-Time Communication

  • Gunilla Berndtsson
  • Marwin Schmitt
  • Peter Hughes
  • Janto Skowronek
  • Katrin Schoenenberg
  • Alexander Raake


In an ideal world people interacting using real-time multimedia links experience perfectly synchronized media, and there is no latency of transmission: the interlocutors would hear and see each other with no delay. Methods to achieve the former are discussed in other chapters in this book, but for a variety of practical and physical reasons, delay-free communication will never be possible. In some cases, the delay will be very obvious since it will be possible to observe the reaction time of the listeners modified by the delay, or there may be some acoustic echo from the listeners’ audio equipment. However, in the absence of echo, the users themselves do not always explicitly notice the presence of delay, even for quite large values. Typically, they notice something is wrong (for example “we kept interrupting each other!”), but are unable to define what it is. Some useful insights into the impact of delay on a conversation can be obtained from the linguistic discipline of Conversation Analysis, and especially the analysis of “turn-taking” in a conversation. This chapter gives an overview of the challenges in evaluating media synchronicity in real-time communications, outlining appropriate tasks and methods for subjective testing and how in-depth analysis of such tests can be performed to gain a deep understanding of the effects of delay. The insights are based on recent studies of audio and audiovisual communication, but also show examples from other media synchronization applications like networked music interaction.


Transmission delay Latency Subjective evaluation Test method Conversation analysis 



Telemeeting [ITU-T P.1301]

A meeting in which participants are located at least two locations and the communication takes place via a telecommunication system. The term telemeeting is used to emphasize that a meeting is often more flexible and interactive than a conventional business teleconference and could also be a private meeting. The telemeeting could be audio-only, audiovisual, text-based, or a mix of these modes.

Conversational quality [ITU-T P.1301]

The perceived quality when two or more test participants have a conversation.

Conversation analysis (CA) [Sacks]

The study of social interaction, in particular focusing on conversations.

Turn construction unit (TCU) [Sacks]

A conversation analysis term describing the fundamental segment of speech in a conversation – essentially a piece of speech that constitutes an entire ‘turn’.

Transition relevance place (TRP) [Sacks]

A conversation analysis term which indicates where a turn or floor exchange can take place between speakers.


  1. 1.
    Barger, R., Church, S., Fukuda, A., Grunke, J., Keislar, D., Moses, B., Novak, B., Pennycook, B., Settel, Z., Strawn, J., Wiser, P., Woszczyk, W.: AES white paper: networking audio and music using internet2 and next generation internet capabilities. Audio Engineering Society, New York (1998)Google Scholar
  2. 2.
    Bech, S., Zacharov, N.: Perceptual Audio Evaluation-Theory, Method and Application. WileyGoogle Scholar
  3. 3.
    Berndtsson, G., Folkesson, M., Kulyk, V.: Subjective quality assessment of video conferences and telemeetings. Packet video workshop 2012, München, Germany (2012)Google Scholar
  4. 4.
    Berry, A.: Spanish and American Turn-Taking Styles: a Comparative Study (1994)Google Scholar
  5. 5.
    Biech, E.: The Pfeiffer book of successful team-building tools: best of the annuals. Pfeiffer (2007)Google Scholar
  6. 6.
    Brady, P.T.: A technique for investigating on-off patterns of speech. Bell Syst. Tech. J. 44(1), 1–22 (1965)CrossRefGoogle Scholar
  7. 7.
    Brady, P.T.: A statistical analysis of on-off patterns in 16 conversations. Bell Syst. Tech. J. 47(1), 73–99 (1968)CrossRefGoogle Scholar
  8. 8.
    Brady, P.T.: Effects of transmission delay on conversational behavior on echo-free telephone circuits. Bell Syst. Tech. J. 50(1), 115–134 (1971)CrossRefGoogle Scholar
  9. 9.
    Braun, A.M.: Qualitätsaspekte multimodaler Kommunikation: Subjektive und objektive Messungen [engl.: Qualityaspects of multimodal communication: Subjective and objective measurements]. Ph.D. thesis, Eidgenössische Technische Hochschule Zürich (2003)Google Scholar
  10. 10.
    Chafe, C., Wilson, S., Leistikow, R. Chisholm, D., Scavone, G.: A simplified approach to high quality music and sound over IP. In: Proceedings of the International Conference on Digital Audio Effects, Verona, Italy (2000)Google Scholar
  11. 11.
    Chafe, C., Gurevich, M.: Network Time Delay and Ensemble Accuracy: Effects of Latency, Asymmetry. J. Audio Eng. Soc. (2004)Google Scholar
  12. 12.
    Clift, R.: Conversation Analysis, Cambridge Textbooks in Linguistics (2016)Google Scholar
  13. 13.
    Cragan JF, Wright DW (1991) Communication in Small Group Discussions: An Integrated Approach. West Publishing CompanyGoogle Scholar
  14. 14.
    Daly-Jones, O., Monk, A., Watts, L.: Some advantages of video conferencing over high-quality audio conferencing: fluency and awareness of attentional focus. Int. J. Hum Comput Stud. 49, 21–58 (1998)CrossRefGoogle Scholar
  15. 15.
    Duffy, S.: Closer—A study of one-to-one instrumental music tuition through video conference. Media and Arts Technology Programme Project Report. Queen Mary, University of London (2011)Google Scholar
  16. 16.
    Egger, S., Schatz, R., Scherer, S.: It takes two to tango—assessing the impact of delay on conversational interactivity on perceived speech quality. In: Annual Conference of the Speech Communication Association, pp. 1321–1324 (2010)Google Scholar
  17. 17.
    Egger, S., Schatz, R., Schoenenberg, K., Raake, A., Kubin, G.: Same but different?—Using speech signal features for comparing conversational VoIP quality studies. In: International Conference on Communications, IEEE, pp. 1320–1324 (2012)Google Scholar
  18. 18.
    Farner, S., Solvang, A., Sæbø, A., Svensson, U.P.: Ensemble hand-clapping experiments under the influence of delay and various acoustic environments. J. Audio Eng. Soc. 57, 1028–1041 (2009)Google Scholar
  19. 19.
    Geelhoed, E., Parker, A., Williams, D.J., Groen, M.: Effects of latency on telepresence. Technical report, HPL-2009-120, HP Laboratories (2009)Google Scholar
  20. 20.
    Guéguin, M., Bouquin-Jeans, R.L., Gautier-Turbin, V., Faucon, G., Barriac, V.: On the evaluation of the conversational speech quality in telecommunications. EURASIP J. Adv. Signal Process. (2008)Google Scholar
  21. 21.
    Hammer, F., Reichl, P., Raake, A.: Elements of interactivity in telephone conversations. In: International Conference Spoken Language, pp. 1741–1744 (2004)Google Scholar
  22. 22.
    Hammer, F.: Quality Aspects of Packet-Based Interactive Speech Communication, Ph.D. thesis, Technical University Graz, Vienna (2006)Google Scholar
  23. 23.
    Huynh-Thu, Q., Garcia, M-N., Speranza, F., Corriveau, P., Raake, A.: Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans. Broadcast. 57, 1–14 (2011).
  24. 24.
    ITU-T: Handbook on Telephonometry. International Telecommunication Union, Geneva (1993)Google Scholar
  25. 25.
    ITU-T Recommendation G.107: The E-model: a computational model for use in transmission planning. International Telecommunication Union, Geneva (2015)Google Scholar
  26. 26.
    ITU-T Recommendation G.114: One-way transmission time. International Telecommunication Union, Geneva (2003)Google Scholar
  27. 27.
    ITU-T Recommendation P.800: Methods for subjective determination of transmission quality. International Telecommunication Union, Geneva (1996)Google Scholar
  28. 28.
    ITU-T Recommendation P.805: Subjective evaluation of conversational quality. International Telecommunication Union, Geneva (2007)Google Scholar
  29. 29.
    ITU-T Recommendation P.831: Subjective performance evaluation of network echo cancellers. International Telecommunication Union, Geneva (1998)Google Scholar
  30. 30.
    ITU-T Recommendation P.832: Subjective performance evaluation of hands-free terminals. International Telecommunication Union, Geneva (2000)Google Scholar
  31. 31.
    ITU-T Recommendation P.920: Interactive test methods for audiovisual communications. International Telecommunication Union, Geneva (2000)Google Scholar
  32. 32.
    ITU-T Recommendation P.1301: Subjective quality evaluation of audio and audiovisual multiparty telemeetings. International Telecommunication Union, Geneva (2012)Google Scholar
  33. 33.
    ITU-T Recommendation P.1305: Effects of delays on the telemeeting quality. International Telecommunication Union, Geneva (2016)Google Scholar
  34. 34.
    ITU-T Recommendation P.1312: Method for the measurement of the communication effectiveness of multiparty telemeetings using task performance. International Telecommunication Union, Geneva (2015)Google Scholar
  35. 35.
    ITU-R Recommendation BT.1359-1: Relative timing of sound and vision for broadcasting. The ITU Radiocommunication Assembly, Geneva (1998)Google Scholar
  36. 36.
  37. 37.
    Jekosch, U.: Voice and Speech Quality Perception—Assessment and Evaluation. Springer (2005)Google Scholar
  38. 38.
    Kapur, A., Wang, G., Davidson, P., Cook, P.R.: Interactive network performance: a dream worth dreaming? In: Organised Sound, vol. 10 (3), pp. 209–219. Cambridge University Press (2005)Google Scholar
  39. 39.
    Kitawaki, N., Itoh, K.: Pure delay effects on speech quality in telecommunications. IEEE J. Sel. Areas Commun. 9(4), 586–593 (1991)Google Scholar
  40. 40.
    Knapp, M.L., Hall, J.A.: Nonverbal Communication in Human Interaction, 7th edn. Wadsworth, Cengage Learning, Boston, USA (2010)Google Scholar
  41. 41.
    Krauss, R.M., Bricker, P.E.: Effects of transmission delay and access delay on the efficiency of verbal communication. J. Acoust. Soc. Am. 41(2), 286–292 (1967)CrossRefGoogle Scholar
  42. 42.
    Krauss, R.M., Garlock, C., Bricker, P.D., McMahon, L.: The role of audible and visible back-channel responses in interpersonal communication. Journal of Personality and Social Psychology 35(7), 523–529 (1977)CrossRefGoogle Scholar
  43. 43.
    Le Callet, P., Perkis, A., Möller, S. (eds.): Qualinet White Paper on Definitions of Quality of Experience (2012). European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003). Version 1.2.” Mar-2013Google Scholar
  44. 44.
    Matejka, J., Glueck, M., Grossman, T., Fitzmaurice, G.: The effect of visual appearance on the performance of continuous sliders and visual analogue scales. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, pp. 5421–5432 (2016)Google Scholar
  45. 45.
    Möller, S.: Assessment and Prediction of Speech Quality in Telecommunications. Springer (2000)Google Scholar
  46. 46.
    O’Conaill, B., Whittaker, S., Wilbur, S.: Conversations over videoconferences: an evaluation of the spoken aspects of video mediated interaction. Hum. Comput. Interact. 8(4), 389–428 (1993). Olson JS, Olson GM, MeaderGoogle Scholar
  47. 47.
    Raake, A.: Speech Quality of VOIP—Assessment and Prediction. Wiley (2006)Google Scholar
  48. 48.
    Raake, A., Hoeldtke, K., Schlegel, C., Ahrens, J., Geier, M.: Listening and conversational quality of spatial audio conferencing. In: International Conference of the Audio Engineering Society, pp. 4–7 (2010)Google Scholar
  49. 49.
    Raake, A., Schoenenberg, K., Skowronek, J., Egger, S.: Predicting speech quality based on interactivity and delay. In: Proceedings of 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), Lyon, France (2013)Google Scholar
  50. 50.
    Richards, D.L.: Conversational performance of speech links subject to long propagation times. In: International Conference on Satellite Communication, IEEE, pp. 955–963 (1962)Google Scholar
  51. 51.
    Riesz, R.R., Klemmer, E.T.: Subjective evaluation of delay and echo suppressors in telephone communications. Bell Syst. Tech. J. 42(6), 2919–2941 (1963)CrossRefGoogle Scholar
  52. 52.
    Ruhleder, K., Jordan, B.: Co-constructing non-mutual realities: delay-generated trouble in distributed interaction. Comput. Support. Coop. Work 10(1), 113–138 (2001)CrossRefGoogle Scholar
  53. 53.
    Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organisation of turn-taking in conversation. Language 50, pp. 696–735 (1974)Google Scholar
  54. 54.
    Sat, B., Huang, Z., Wah, B.W.: The design of a multi-party VoIP conferencing system over the internet. In: International Symposium on Multimedia, IEEE, pp. 3–10 (2007)Google Scholar
  55. 55.
    Schmitt, M., Gunkel, S., Cesar, P., Hughes, P.: A QoE testbed for socially-aware video-mediated group communication. In: Proceedings of the 2nd International Workshop on Socially-aware Multimedia, New York, NY, USA, pp. 37–42 (2013)Google Scholar
  56. 56.
    Schmitt, M., Gunkel, S., Cesar, P., Bulterman, D.: The influence of interactivity patterns on the Quality of Experience in multi-party video-mediated conversations under symmetric delay conditions. In: Proceedings of the 3rd International Workshop on Socially-aware Multimedia, New York, NY, USA (2014)Google Scholar
  57. 57.
    Schoenenberg, K., Raake, A., Egger, S., Schatz, R.: On interaction behaviour in telephone conversations under transmission delay. Speech Commun. 63–64, 1–14 (2014)CrossRefGoogle Scholar
  58. 58.
    Schoenenberg, K., Schmieder, M.: 3rnt—3-party random number verification (timed version) task (2014).
  59. 59.
    Schoenenberg, K., Raake, A., Koeppe, J.: Why are you so slow?—Misattribution of transmission delay to attributes of the conversation partner at the far-end. Int. J. Hum.-Comput. Stud. 72(5), 477–487, May 2014Google Scholar
  60. 60.
    Schoenenberg, K.: The Quality of Mediated-Conversations under Transmission Delay, Ph.D. Thesis, Technical University of Berlin (2016)Google Scholar
  61. 61.
    Skowronek, J., Schiffner, F., Schoenenberg, K., Raake, A.: 3sct 3-party short conversation test scenarios for conferencing assessment (version 03) (2014).
  62. 62.
    Skowronek, J.: Quality of Experience of Multiparty Conferencing and Telemeeting Systems Methods and Models for Assessment and Prediction, Ph.D. thesis (2017).
  63. 63.
  64. 64.
    Tam, J., Carter, E., Kiesler, S., Hodgins, J.: Video increases the perception of naturalness during remote interactions with latency. In: Proceedings of CHI’12, New York, NY, USA, pp. 2045–2050 (2012)Google Scholar
  65. 65.
    Vartabedian, A.G.: The effects of transmission delay in four-wire teleconferencing. Bell Syst. Tech. J. 45(10), 1673–1688 (1966)CrossRefGoogle Scholar
  66. 66.
    Wah, B.W., Sat, B.: The design of VoIP systems with high perceptual conversational quality. J. Multimed. 4(2), 49–62 (2009)CrossRefGoogle Scholar
  67. 67.
    Wang, J., Yang, F., Xie, Z., Wan, S.: Evaluation on perceptual audiovisual delay using average talkspurts and delay. In: 2010 3rd International Congress on Image and Signal Processing (CISP), vol. 1, pp. 125–128 (2010)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Gunilla Berndtsson
    • 1
  • Marwin Schmitt
    • 2
  • Peter Hughes
    • 3
  • Janto Skowronek
    • 4
  • Katrin Schoenenberg
    • 5
  • Alexander Raake
    • 4
  1. 1.Digital Representation and Interaction, Ericsson ResearchStockholmSweden
  2. 2.Centrum Wiskunde & Informatica (CWI)AmsterdamNetherlands
  3. 3.British Telecom Research and InnovationMartlesham HeathIpswichUK
  4. 4.Audiovisual Technology Group – Institute of Media Technology – Ilmenau University of TechnologyIlmenauGermany
  5. 5.Clinical Psychology and PsychotherapyUniversity of WuppertalWuppertalGermany

Personalised recommendations