Skip to main content
Log in

Assessing the importance of audio/video synchronization for simultaneous translation of video sequences

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Lip synchronization is considered a key parameter during interactive communication. In the case of video conferencing and television broadcasting, the differential delay between audio and video should remain below certain thresholds, as recommended by several standardization bodies. However, further research has also shown that these thresholds can be relaxed, depending on the targeted application and use case. In this article, we investigate the influence of lip sync on the ability to perform real-time language interpretation during video conferencing. Furthermore, we are also interested in determining proper lip sync visibility thresholds applicable to this use case. Therefore, we conducted a subjective experiment using expert interpreters, which were required to perform a simultaneous translation, and non-experts. Our results show that significant differences are obtained when conducting subjective experiments with expert interpreters. As interpreters are primarily focused on performing the simultaneous translation, lip sync detectability thresholds are higher compared with existing recommended thresholds. As such, primary focus and the targeted application and use case are important factors to be considered when selecting proper lip sync acceptability thresholds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. ATSC IS-191: Relative timing of sound and vision for broadcast operations (2003)

  2. von Bekesy, G.: A new audiometer. Acta Otolaryngol. 35, 411–422 (1947)

    Article  Google Scholar 

  3. Curcio, I.D., Lundan, M.: Human perception of lip synchronization in mobile environment. In: IEEE international symposium on a world of wireless, mobile and multimedia networks (2007)

  4. DSL Forum Technical Report TR-126: Triple-play services quality of experience (QoE) requirements. DSL Forum (2006)

  5. EBU Recommendation R37: The relative timing of the sound and vision components of a television signal (2007)

  6. Firestone, S., Ramalingam, T., Fry, S.: Voice and Video Conferencing Fundamentals, chap. 7, pp. 223–255. Cisco Press (2007)

  7. Ford, C., McFarland, M., Ingram, W., Hanes, S., Pinson, M., Webster, A., Anderson, K.: Multimedia synchronization study. Tech. rep., National Telecommunications and Information Administration (NTIA), Institute for Telecommunication Sciences (ITS) (2009)

  8. Fridrich, J., Goljan, M.: Robust hash functions for digital watermarking. In: International conference on information technology: coding and computing (ITCC) (2000)

  9. Garcia, M.N., Schleicher, R., Raake, A.: Impairment-factor-based audiovisual quality model for iptv: Influence of video resolution, degradation type, and content type. EURASIP J. Image Video Process. 2011 (2011)

  10. Huynh-Thu, Q., Garcia, M.N., Speranza, F., Corriveau, P., Raake, A.: Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans. Broadcast. 57(1), 1–14 (2011)

    Article  Google Scholar 

  11. ITU-R Recommendation BT.1359: Relative timing of sound and vision for broadcasting (1998)

  12. ITU-R Recommendation BT.500: Methodology for the subjective assessment of the quality of television pictures (2009)

  13. ITU-T Recommendation G.1080: Quality of experience requirements for IPTV services. International Telecommunication Union (ITU) (2008)

  14. ITU-T Recommendation J.148: Requirements for an objective perceptual multimedia quality model. International Telecommunication Union (ITU) (2003)

  15. ITU-T Recommendation P.10/G.100 Amd 2: Vocabulary for performance and quality of service (2008)

  16. ITU-T Recommendation P.911: Subjective audiovisual quality assessment methods for multimedia applications. International Telecommunication Union (ITU) (1998)

  17. ITU-T Recommendation P.920: Interactive test methods for audiovisual communications. International Telecommunication Union (ITU) (2000)

  18. Kudrle, S., Proulx, M., Carrieres, P., Lopez, M.: Fingerprinting for solving A/V synchronization issues within broadcast environments. SMPTE Motion Imaging J., 47–57 (2011)

  19. Mued, L., Lines, B., Furnell, S., Reynolds, P.: Acoustic speech to lip feature mapping for multimedia applications. In: 3rd international symposium on image and signal processing and analysis (ISPA), pp. 829–832 (2003)

  20. Mued, L., Lines, B., Furnell, S., Reynolds, P.: The effects of lip synchronization in ip conferencing. In: International conference on visual information engineering, pp. 210–213 (2003)

  21. Mued, L., Lines, B., Furnell, S., Reynolds, P.: The effects of audio and video correlation and lip synchronization. Campus Wide Inf Syst 20, 159–166 (2003)

    Article  Google Scholar 

  22. Nezveda, M., Buchinger, S., Robitza, W., Hotop, E., Hummelbrunner, P., Hlavacs, H.: Test persons for subjective video quality testing: experts or non-experts? In: QoEMCS workshop at the EuroITV, 8th European conference on interactive TV (2010)

  23. Pereira, F., Alpert, T.: MPEG-4 video subjective test procedures and results. IEEE Trans. Circuits Syst. Video Technol. 7(1), 32–51 (1997)

    Article  Google Scholar 

  24. Pinson, M.H.: Impact of lab effects and environment on audiovisual quality. VQEG_MM_2011_010_audiovisual quality repeatability, Seoul (2011)

  25. Radhakrishnan, R., Terry, K., Bauer, C.: Audio and video signatures for synchronization. In: IEEE international conference on multimedia and expo (ICME), pp. 1549–1552 (2008)

  26. Speranza, F., Poulin, F., Renaud, R., Caron, M., Dupras, J.: Objective and subjective quality assessment with expert and non-expert viewers. In: Second international workshop on quality of multimedia experience (QoMEX), pp. 46–51 (2010)

  27. Staelens, N., Moens, S., Van den Broeck, W., Marien, I., Vermeulen, B., Lambert, P., Van de Walle, R., Demeester, P.: Assessing quality of experience of IPTV and video on demand services in real-life environments. IEEE Trans. Broadcast. 56(4), 458–466 (2010)

    Article  Google Scholar 

  28. Steinmetz, R.: Human perception of jitter and media synchronization. IEEE J. Sel. Areas Commun. 14(1), 61–72 (1996)

    Article  Google Scholar 

  29. Stojancic, M., Eakins, D.: Interoperable AV sync systems in the SMPTE 22TV Lip Sync AHG: content-fingerprinting-based audio–video synchronization. SMPTE Motion Imaging J., 47–57 (2011)

  30. Terry, K., Radhakrishnan, R.: Detection and correction of lip-sync errors using audio and video fingerprints. In: SMPTE annual tech conference and expo (2009)

  31. Winkler, S.: Digital video quality—vision models and metrics. Wiley, New York (2005)

    Google Scholar 

  32. Younkin, A., Corriveau, P.: Determining the amount of audio–video synchronization errors perceptible to the average end-user. IEEE Trans. Broadcast. 54(3), 623–627 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

The research activities that have been described in this paper were funded by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT) and the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT). This paper is the result of research carried out as part of the OMUS project funded by the IBBT. OMUS is being carried out by a consortium of the industrial partners: Technicolor, Televic, Streamovations and Excentis in cooperation with the IBBT research groups: IBCN & MultimediaLab & WiCa (UGent), SMIT (VUB), PATS (UA), and COSIC (KUL). Glenn Van Wallendael and Jan De Cock would also like to thank the Institute for the Promotion of Innovation through Science and Technology in Flanders for financially supporting their Ph.D. and postdoctoral grant, respectively. The authors would also like to thank Dr. Bart Defrancq, Lecturer and Coordinator of the PP in Conference Interpreting at University College Ghent, for his contributions to this work and support in acquiring the expert test subjects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Staelens.

Additional information

Communicated by R. Steinmetz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Staelens, N., De Meulenaere, J., Bleumers, L. et al. Assessing the importance of audio/video synchronization for simultaneous translation of video sequences. Multimedia Systems 18, 445–457 (2012). https://doi.org/10.1007/s00530-012-0262-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-012-0262-4

Keywords

Navigation