A Synchronization Ground Truth for the Jiku Mobile Video Dataset

  • Mario Guggenberger
  • Mathias Lux
  • Laszlo Böszörmenyi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8936)


This paper introduces and describes a manually generated synchronization ground truth, accurate to the level of the audio sample, for the Jiku Mobile Video Dataset, a dataset containing hundreds of videos recorded by mobile users at different events with drama, dancing and singing performances. It aims at encouraging researchers to evaluate the performance of their audio, video, or multimodal synchronization methods on a publicly available dataset, to facilitate easy benchmarking, and to ease the development of mobile video processing methods like audio and video quality enhancement, analytics and summary generation that depend on an accurately synchronized dataset.


Audio video multimedia crowd events synchronization time drift 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    ATSC. Relative Timing of Sound and Vision for Broadcast Operations (IS-191). Advanced Television Systems Committee (June 2003)Google Scholar
  2. 2.
    Baluja, S., Covell, M.: Content fingerprinting using wavelets. In: 3rd European Conference on Visual Media Production, CVMP 2006, pp. 198–207 (November 2006)Google Scholar
  3. 3.
    Duong, N., Howson, C., Legallais, Y.: Fast second screen tv synchronization combining audio fingerprint technique and generalized cross correlation. In: 2012 IEEE International Conference on Consumer Electronics - Berlin (ICCE-Berlin), pp. 241–244 (September 2012)Google Scholar
  4. 4.
    Guggenberger, M., Lux, M., Boszormenyi, L.: Audioalign - synchronization of A/V-streams based on audio data. In: 2012 IEEE International Symposium on Multimedia (ISM), pp. 382–383 (December 2012)Google Scholar
  5. 5.
    Guggenberger, M., Lux, M., Böszörmenyi, L.: An analysis of time drift in hand-held recording devices. In: He, X., Xu, C., Tao, D., Luo, S., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part II. LNCS, vol. 8935, pp. 199–209. Springer, Heidelberg (2015)Google Scholar
  6. 6.
    Haitsma, J., Kalker, T.: A highly robust audio fingerprinting system. In: Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR), Paris, France (2002)Google Scholar
  7. 7.
    ITU. Relative timing of sound and vision for broadcasting (ITU-R BT.1359-1). International Telecommunication Union (November 1998)Google Scholar
  8. 8.
    Ke, Y., Hoiem, D., Sukthankar, R.: Computer vision for music identification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 597–604 (June 2005)Google Scholar
  9. 9.
    Kennedy, L., Naaman, M.: Less talk, more rock: Automated organization of community-contributed collections of concert videos. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 311–320. ACM, New York (2009)Google Scholar
  10. 10.
    Llagostera Casanovas, A., Cavallaro, A.: Audio-visual events for multi-camera synchronization. Multimedia Tools and Applications, 1–24 (2014)Google Scholar
  11. 11.
    Mansoo, P., Hoi-Rin, K., Ro, Y.M., Munchurl, K.: Frequency filtering for a highly robust audio fingerprinting scheme in a real-noise environment. IEICE Transactions on Information and Systems 89(7), 2324–2327 (2006)Google Scholar
  12. 12.
    Moon, S., Skelly, P., Towsley, D.: Estimation and removal of clock skew from network delay measurements. In: Proceedings of the Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies, INFOCOM 1999, vol. 1, pp. 227–234. IEEE (March 1999)Google Scholar
  13. 13.
    Saini, M., Venkatagiri, S.P., Ooi, W.T., Chan, M.C.: The jiku mobile video dataset. In: Proceedings of the 4th ACM Multimedia Systems Conference, MMSys 2013, pp. 108–113. ACM, New York (2013)Google Scholar
  14. 14.
    Shankar, S., Lasenby, J., Kokaram, A.: Warping trajectories for video synchronization. In: Proceedings of the 4th ACM/IEEE International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Stream, ARTEMIS 2013, pp. 41–48. ACM, New York (2013)Google Scholar
  15. 15.
    Sharma, S., Hussain, A., Saran, H.: Experience with heterogenous clock-skew based device fingerprinting. In: Proceedings of the 2012 Workshop on Learning from Authoritative Security Experiment Results, LASER 2012, pp. 9–18. ACM, New York (2012)CrossRefGoogle Scholar
  16. 16.
    Shrestha, P., Barbieri, M., Weda, H., Sekulovski, D.: Synchronization of multiple camera videos using audio-visual features. IEEE Transactions on Multimedia 12(1), 79–92 (2010)CrossRefGoogle Scholar
  17. 17.
    Shrestha, P., Weda, H., Barbieri, M., Sekulovski, D.: Synchronization of multiple video recordings based on still camera flashes. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, MULTIMEDIA 2006, pp. 137–140. ACM, New York (2006)Google Scholar
  18. 18.
    Shrstha, P., Barbieri, M., Weda, H.: Synchronization of multi-camera video recordings based on audio. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA 2007, pp. 545–548. ACM, New York (2007)Google Scholar
  19. 19.
    Whitehead, A., Laganiere, R., Bose, P.: Temporal synchronization of video sequences in theory and in practice. In: Seventh IEEE Workshops on Application of Computer Vision, WACV/MOTIONS 2005, vol. 1, 2, pp. 132–137 (January 2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mario Guggenberger
    • 1
  • Mathias Lux
    • 1
  • Laszlo Böszörmenyi
    • 1
  1. 1.Institute of Information TechnologyAlpen-Adria-Universität KlagenfurtKlagenfurt am WörtherseeAustria

Personalised recommendations