Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 2, pp 2017–2044 | Cite as

An effective method for audio-to-score alignment using onsets and modified constant Q spectra

  • Chunta ChenEmail author
  • Jyh-Shing Roger Jang
Article
  • 71 Downloads

Abstract

This paper proposes an effective algorithm for polyphonic audio-to-score alignment that aligns a polyphonic music performance to its corresponding score. The proposed framework consists of three steps: onset detection, note matching, and dynamic programming. In the first step, onsets are detected and then onset features are extracted by applying the constant Q transform around each onset. A similarity matrix is computed using a note-matching function to evaluate the similarity between concurrent notes in the music score and onsets in the audio recording. Finally, dynamic programming is used to extract the optimal alignment path in the similarity matrix. We compared five onset detectors and three spectrum difference vectors at selected audio onsets. The experimental results revealed that our method achieved higher precision than did the other algorithms included for comparison. This paper also proposes an online approach based on onset detection that can detect most notes within only 10 ms. Based on our experimental results, this online approach outperforms all methods included for comparison when the tolerance window is 50 ms.

Keywords

Music synchronization Audio-to-score alignment Audio onset detection Score following 

Notes

Acknowledgments

This research is partially supported by Ministry of Science and Technology, ROC, under Grant no. MOST 104-2221-E-002-051-MY3.

References

  1. 1.
    Arzt A, Widmer G, Dixon S (2008) Automatic page turning for musicians via real-time machine listening. Proceedings of European Conference on Artificial Intelligence (ECAI), p 241–245Google Scholar
  2. 2.
    Bello JP, Daudet L, Abdallah S, Duxbury C, Davies M, Sandler MB (2005) A tutorial on onset detection in music signals. IEEE Trans Audio Speech Lang Process 13:1035–1047CrossRefGoogle Scholar
  3. 3.
    Böck S, Widmer G (2013) Maximum filter vibrato suppression for onset detection. Proceedings of the 16th International Conference on Digital Audio Effects, p 55–61Google Scholar
  4. 4.
    Böck S, Widmer G (2013) Local group delay based vibrato and tremolo suppression for onset detection. Proceedings of the 14th International Society of Music Information Retrieval Conference (ISMIR), p 361–366Google Scholar
  5. 5.
    Böck S, Korzeniowski F, Schlüter J, Krebs F, Widmer G (2016) madmom: a new Python audio and music signal processing library. Proceeding MM ‘16 Proceedings of the 2016 ACM on Multimedia Conference, p 1174–1178Google Scholar
  6. 6.
    Brown JC (1991) Calculation of a constant Q spectral transform. J Acoust Soc Am 89(1):425–434CrossRefGoogle Scholar
  7. 7.
    Cai J, Guo Y, Wang H, Wang Y (2014) Score-informed source separation based on real-time polyphonic score-to-audio alignment and bayesian harmonic model. International Conference on Computational Intelligence and Communication Networks, p 672–680Google Scholar
  8. 8.
    Carabias-Orti JJ, Rodriguez-Serrano FJ, Vera-Candeas P, Ruiz-Reyes N, Canadas-Quesada FJ (2015) An audio to score alignment framework using spectral factorization and dynamic time warping. 16th International Society for Music Information Retrieval (ISMIR) Conference, p 742–748Google Scholar
  9. 9.
    Chen C-T, Jang J-SR, Liou W (2014) Improved score-performance alignment algorithms on polyphonic music. Proceedings of the 39th IEEE International Conference on Acoustics, Speech and Signal Processing, p 1365–1369Google Scholar
  10. 10.
    Chen C-T, Jang J-SR, Liou W-S, Weng C-Y (2016) An efficient method for polyphonic audio-to-score alignment using onset detection and constant Q transform. Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, p 2802–2806Google Scholar
  11. 11.
    Cont A (2006) Realtime audio to score alignment for polyphonic music instruments, using sparse non-negative constraints and hierarchical HMMS. Proceedings of the 31st International Conference on Acoustics, Speech and Signal Processing, p 245–248Google Scholar
  12. 12.
    Cont A, Schwarz D, Schnell N, Raphael C (2007) Evaluation of real-time audio-to-score alignment. International Society on Music Information Retrieval, p 315–316Google Scholar
  13. 13.
    Dannenberg RB (1984) An on-line algorithm for real time accompaniment. Proceedings of the 1984 International Computer Music Conference, p 193–198Google Scholar
  14. 14.
    Dannenberg RB, Hu N (2003) Polyphonic audio matching for score following and intelligent audio editors. Proceedings of the 2003 International Computer Music Conference, San Francisco: International Computer Music Association, p 27–34Google Scholar
  15. 15.
    Degara-Quintela N, Pena A, Torres-Guijarro S (2009) A comparison of score-level fusion rules for onset detection in music signals. Proceedings of the 10th International Conference on Music Information Retrieval, p 117–121Google Scholar
  16. 16.
    Dixon S (2006) Onset detection revisited. Proceedings of the International Conference on Digital Audio Effects, p 133–137Google Scholar
  17. 17.
    Dorfer M, Arzt A, Widmer G (2017) Learning audio-sheet music correspondences for score identification and offline alignment. Proceedings of the International Society for Music Information Retrieval Conference, p 115–122Google Scholar
  18. 18.
    Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Top Signal Process 5(6):1205–1215CrossRefGoogle Scholar
  19. 19.
    Duxbury C, Bello JP, Davies M, Sandler MB (2003) A combined phase and amplitude based approach to onset detection for audio segmentation. Proceedings of the European Workshop on Image Analysis for Multimedia Interactive Services, p 275–280Google Scholar
  20. 20.
    Eyben F, Böck S, Schuller B, Graves A (2010) Universal onset detection with bidirectional long short-term memory neural networks. Proceedings of the 11th International Conference on Music Information Retrieval, p 589–594Google Scholar
  21. 21.
    Holzapfel A, Stylianou Y, Gedik AC, Bozkurt B (2010) Three dimensions of pitched instrument onset detection. IEEE Trans Audio Speech Lang Process 1517–1527Google Scholar
  22. 22.
    Hu N, Dannenberg RB, Tzanetakis G (2003) Polyphonic audio matching and alignment for music retrieval. Proceedings IEEE WASPAA, New Paltz, p 185–188Google Scholar
  23. 23.
    Joder C, Essid S, Richard G (2011) A conditional random field framework for robust and scalable audio-to-score matching. IEEE Trans Audio Speech Lang Process 19(8):2385–2397CrossRefGoogle Scholar
  24. 24.
    Joder C, Essid S, Richard G (2013) Learning optimal features for polyphonic audio-to-score alignment. IEEE Trans Audio Speech Lang Process 21(10):2118–2128CrossRefGoogle Scholar
  25. 25.
    Lacoste A, Eck D (2005) Onset detection with artificial neural networks. Proceedings of the International Conference on Music Information RetrievalGoogle Scholar
  26. 26.
    Lacoste A, Eck D (2007) A supervised classification algorithm for note onset detection. EURASIP J Appl Signal Process 153–166Google Scholar
  27. 27.
    Lerch A (2012) “Alignment”. An introduction to audio content analysis: applications in signal processing and music informatics. Wiley, Hoboken, p 148–149Google Scholar
  28. 28.
    Müller M (2007) Music synchronization. Information retrieval for music and motion. Springer, p 85–108Google Scholar
  29. 29.
    Ono N, Miyamoto K, Kameoka H, Le Roux J, Uchiyama Y, Tsunoo E, Nishimoto T, Sagayama S (2010) Harmonic and percussive sound separation and its application to MIR-related tasks. Adv Music Inf Retr 274:213–236CrossRefGoogle Scholar
  30. 30.
    Orio N, Schwarz D (2001) Alignment of monophonic and polyphonic music to a score. Proceedings 2001 ICMC, p 155–158Google Scholar
  31. 31.
    Orio N, Lemouton S, Schwarz D (2003) Score following: State of the art and new developments. Proceedings of the 2003 conference on New interfaces for musical expression, Montreal, Canada, p 34–41Google Scholar
  32. 32.
    Raffel C, Ellis DPW (2016) Optimizing DTW-based audio-to-MIDI alignment and matching. Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, p 81–85Google Scholar
  33. 33.
    Rodriguez-Serrano FJ, Carabias-Orti JJ, Vera-Candeas P, Martinez-Muñoz D (2017) Tempo driven audio-to-score alignment using spectral decomposition and online dynamic time warping. ACM Trans Intell Syst Technol 8(2):1–20CrossRefGoogle Scholar
  34. 34.
    Sako S, Yamamoto R, Kitamura T (2014) Ryry: a real-time score-following automatic accompaniment playback system capable of real performances with errors, repeats and jumps. Active Media Technology: 10th International Conference, AMT 2014, Warsaw, Poland, p 134–145Google Scholar
  35. 35.
    Salamon J, Gómez E, Ellis DPW, Richard G (2014) Melody extraction from polyphonic music signals: approaches, applications and challenges. IEEE Signal Process Mag 31(2):118–134CrossRefGoogle Scholar
  36. 36.
    Schlüter J, Böck S (2013) Musical onset detection with convolutional neural networks. International Workshop on Machine Learning and Music (MML), Prague, Czech Republic, p 1–4Google Scholar
  37. 37.
    Schlüter J, Böck S (2014) Improved musical onset detection with convolutional neural networks. Proceedings of the 39th International. Conference on Acoustics, Speech and Signal Processing, p 6979–6983Google Scholar
  38. 38.
    Song X, Ming Z, Nie L, Zhao Y-L, Chua T-S (2016) Volunteerism tendency prediction via harvesting multiple social networks. ACM Trans Inf Syst 34(2):10:1–10:27CrossRefGoogle Scholar
  39. 39.
    Tachibana H, Ono N, Kameoka H, Sagayama S (2014) Harmonic/percussive sound separation based on anisotropic smoothness of spectrograms. IEEE/ACM Trans Audio Speech Lang Process 22:2059–2073CrossRefGoogle Scholar
  40. 40.
    Tian M, Fazekas G, Black DAA, Sandler M (2014) Design and evaluation of onset detectors using different fusion policies. Proceedings of the International Society of Music Information Retrieval, p 631–636Google Scholar
  41. 41.
    Ueda Y, Uchiyama Y, Nishimoto T, Ono N, Sagayama S (2010) HMM-based approach for automatic chord detection using refined acoustic features. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, p 5518–5521Google Scholar
  42. 42.
    Wang S, Ewert S, Dixon S (2016) Robust and efficient joint alignment of multiple musical performances. IEEE Trans Audio Speech Lang Process 24(11):2132–2145CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computer Science DepartmentNational Tsing Hua UniversityHsinchu CityTaiwan
  2. 2.Computer Science DepartmentNational Taiwan UniversityTaipei CityTaiwan

Personalised recommendations