Synchronizing multimodal recordings using audio-to-audio alignment

Six, Joren; Leman, Marc

doi:10.1007/s12193-015-0196-1

Synchronizing multimodal recordings using audio-to-audio alignment

An application of acoustic fingerprinting to facilitate music interaction research

Original Paper
Published: 14 August 2015

Volume 9, pages 223–229, (2015)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Joren Six¹ &
Marc Leman¹

346 Accesses
6 Citations
Explore all metrics

Abstract

Research on the interaction between movement and music often involves analysis of multi-track audio, video streams and sensor data. To facilitate such research a framework is presented here that allows synchronization of multimodal data. A low cost approach is proposed to synchronize streams by embedding ambient audio into each data-stream. This effectively reduces the synchronization problem to audio-to-audio alignment. As a part of the framework a robust, computationally efficient audio-to-audio alignment algorithm is presented for reliable synchronization of embedded audio streams of varying quality. The algorithm uses audio fingerprinting techniques to measure offsets. It also identifies drift and dropped samples, which makes it possible to find a synchronization solution under such circumstances as well. The framework is evaluated with synthetic signals and a case study, showing millisecond accurate synchronization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

If for example audio with an 8,000 Hz sample rate is used and each analysis frame is 128 samples, then time resolution is limited to 16 ms.
SyncSink is included into the GPL’d Panako project available at http://panako.be.
An Intel Core2 Quad CPU Q9650 @ 3.00 GHz 4 was used, with 8GB memory. A CPU that entered the market late 2008.
To support real-time recording writing a 512 byte buffers should be faster than 256 / 44,100 Hz \(=5.802\) ms, on average.

References

Bannach D, Amft O, Lukowicz P (2009) Automatic event-based synchronization of multimodal data streams from wearable and ambient sensors. In: EuroSSC 2009: proceedings of the European conference on smart sensing and context, lecture note in computer science, vol 5741, pp 135–148. Springer
Camurri A, Coletta P, Massari A, Mazzarino B, Peri M, Ricchetti M, Ricci A, Volpe G (2004) Toward real-time multimodal processing: EyesWeb 4.0. In: AISB 2004 convention: motion, emotion and cognition
Cannam C, Landone C, Sandler M, Bello J (2006) The Sonic Visualiser: a visualisation platform for semantic descriptors from musical signals. In: Proceedings of the 7th international symposium on music information retrieval (ISMIR 2006). Victoria, Canada
Cotton CV, Ellis DPW (2010) Audio fingerprinting to identify multiple videos of an event. In: IEEE international conference on acoustics speech and signal processing (ICASSP), pp 2386–2389. IEEE
Godøy RI, Leman M (2010) Musical gestures: sound, movement, and meaning. Routledge, New York
Google Scholar
Gowing M, Kelly P, O’Connor NE, Concolato C, Essid S, Feuvre JL, Tournemenne R, Izquierdo E, Kitanovski V, Lin X, Zhang Q (2011) Enhanced visualisation of dance performance from automatically synchronised multimodal recordings. In: Candan KS, Panchanathan S, Prabhakaran B, Sundaram H, Chi Feng W, Sebe N (eds) ACM Multimedia, pp 667–670. ACM
Hochenbaum J, Kapur A (2012) Nuance: a software tool for capturing synchronous data streams from multimodal musical systems. In: International computer music conference, pp 1 – 6. ICMC
Jaimovich J, Knapp B (2010) Synchronization of multimodal recordings for musical performance research. In: Beilharz K, Bongers B, Johnston A, Ferguson S (eds) Proceedings of the international conference on new interfaces for musical expression (NIME), Australia, Sydney, pp 372–374
Mayor O, Llimona Q, Marchini M, Papiotis P, Maestre E (2013) RepoVIZZ: a framework for remote storage, browsing, annotation, and exchange of multi-modal data. In: Proceedings of the 21st ACM international conference on multimedia, pp 415–416. ACM
Ogle J, Ellis DPW (2007) Fingerprinting to identify repeated sound events in long-duration personal audio recordings. In: IEEE international conference on acoustics speech and signal processing (ICASSP), pp 1–233. Hawaï
Shrestha P, Barbieri M, Weda H (2007) Synchronization of multi-camera video recordings based on audio. In: Proceedings of the 15th international conference on multimedia, MULTIMEDIA ’07ACM, New York, NY, USA, pp 545–548
Six J, Cornelis O, Leman M (2013) Tarsos, a modular platform for precise pitch analysis of Western and non-Western music. J N Music Res 42(2):113–129
Article Google Scholar
Six J, Cornelis O, Leman M (2014) TarsosDSP, a real-time audio processing framework in Java. In: Proceedings of the 53rd AES conference (AES 53rd). The Audio Engineering Society
Six J, Leman M (2014) Panako—a scalable acoustic fingerprinting system handling time-scale and pitch modification. In: Proceedings of the 15th ISMIR conference (ISMIR 2014), pp 1–6
Wang ALC (2003) An industrial-strength audio search algorithm. In: Proceedings of the 4th international symposium on music information retrieval (ISMIR 2003), pp 7–13
Wang ALC, Culbert D (2002) Robust and invariant audio pattern matching. US Patent US7627477
Wittenburg P, Brugman H, Russel A, Klassmann A, Sloetjes H (2006) ELAN: a professional framework for multimodality research. In: Proceedings of language resources and evaluation conference (LREC)

Download references

Author information

Authors and Affiliations

Department of Musicology, Institute for Psychoacoustics and Electronic Music (IPEM), Ghent University, Ghent, Belgium
Joren Six & Marc Leman

Authors

Joren Six
View author publications
You can also search for this author in PubMed Google Scholar
Marc Leman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joren Six.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (7z 76651 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Six, J., Leman, M. Synchronizing multimodal recordings using audio-to-audio alignment. J Multimodal User Interfaces 9, 223–229 (2015). https://doi.org/10.1007/s12193-015-0196-1

Download citation

Received: 18 March 2015
Accepted: 07 August 2015
Published: 14 August 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s12193-015-0196-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synchronizing multimodal recordings using audio-to-audio alignment

Abstract

Access this article

Similar content being viewed by others

Feature Matching of Simultaneous Signals for Multimodal Synchronization

An Analysis of Time Drift in Hand-Held Recording Devices

Audio-to-Audio Alignment for Performances Tracking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (7z 76651 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Synchronizing multimodal recordings using audio-to-audio alignment

Abstract

Access this article

Similar content being viewed by others

Feature Matching of Simultaneous Signals for Multimodal Synchronization

An Analysis of Time Drift in Hand-Held Recording Devices

Audio-to-Audio Alignment for Performances Tracking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (7z 76651 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation