An Integrated Processing Method Based on Wasserstein Barycenter Algorithm for Automatic Music Transcription

  • Cong Jin
  • Zhongtong Li
  • Yuanyuan Sun
  • Haiyin Zhang
  • Xin LvEmail author
  • Jianguang Li
  • Shouxun Liu
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 313)


Given a piece of acoustic musical signal, various automatic music transcription (AMT) processing methods have been proposed to generate the corresponding music notations without human intervention. However, the existing AMT methods based on signal processing or machine learning cannot perfectly restore the original music signal and have significant distortion. In this paper, we propose a novel processing method which integrates various AMT methods so as to achieve better performance on music transcription. This integrated method is based on the entropic regularized Wasserstein Barycenter algorithm to speed up the computation of the Wasserstein distance and minimize the distance between two discrete distributions. Moreover, we introduce the proportional transportation distance (PTD) to evaluate the performance of different methods. Experimental results show that the precision and accuracy of the proposed method increase by approximately 48% and 67% respectively compared with the existing methods.


Automatic Music Transcription Machine learning Wasserstein Barycenter Ensemble NMF 


  1. 1.
    Moorer, J.A.: On the transcription of musical sound by computer. Comput. Music J. 1(4), 32–38 (1977)Google Scholar
  2. 2.
    Piszczalski, M., Galler, B.A.: Automatic music transcription. Comput. Music J. 1(4), 22–31 (1977)Google Scholar
  3. 3.
    Duan, Z., Benetos, E.: Automatic music transcription. In: Proceedings of the International Society for Music Information Retrieval Conference, Malaga, Spain (2015)Google Scholar
  4. 4.
    Chunghsin, Y.: Multiple fundamental frequency estimation of polyphonic recordings (2008)Google Scholar
  5. 5.
    Nam, J., Ngiam, J., Lee, H., Slaney, M.: A classification-based polyphonic piano transcription approach using learned feature representations (2011)Google Scholar
  6. 6.
    Duan, Z., Pardo, B., Zhang, C.: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)CrossRefGoogle Scholar
  7. 7.
    Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)CrossRefGoogle Scholar
  8. 8.
    Peeling, P.H., Godsill, S.J.: Multiple pitch estimation using non-homogeneous poisson processes. IEEE J. Sel. Top. Signal Process. 5(6), 1133–1143 (2011)CrossRefGoogle Scholar
  9. 9.
    Vincent, E., Bertin, N., Badeau, R.: Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio Speech Lang. Process. 18(3), 528–537 (2010)CrossRefGoogle Scholar
  10. 10.
    Bertin, N., Badeau, R., Vincent, E.: Enforcing harmonicity and smoothness in Bayesian nonnegative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio Speech Lang. Process. 18(3), 538–549 (2010)CrossRefGoogle Scholar
  11. 11.
    Fuentes, B., Badeau, R., Richard, G.: Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 401–404 (2011)Google Scholar
  12. 12.
    Abdallah, S.M., Plumbley, M.D.: Polyphonic transcription by non-negative sparse coding of power spectra. In: Proceedings of the International Society for Music Information Retrieval Conference (2004)Google Scholar
  13. 13.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)CrossRefGoogle Scholar
  14. 14.
    Ding, H., Liu, M.: On geometric prototype and applications. In: 26th Annual European Symposium on Algorithms, pp. 1–15 (2018)Google Scholar
  15. 15.
    Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Upper Saddle River (1993)zbMATHGoogle Scholar
  16. 16.
    Agarwal, P.K., Fox, K., Panigrahi, D., Varadarajan, K.R., Xiao, A.: Faster algorithms for the geometric transportation problem. In: 33rd International Symposium on Computational Geometry, pp. 1–16 (2017)Google Scholar
  17. 17.
    Cabello, S., Giannopoulos, P., Knauer, C., Rote, G.: Matching point sets with respect to the Earth Mover’s Distance. Comput. Geom. 39(2), 118–133 (2008)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)Google Scholar
  19. 19.
    Cuturi, M., Doucet, A.: Fast computation of Wasserstein Barycenters. In: International Conference on Machine Learning, pp. 685–693 (2014)Google Scholar
  20. 20.
    Baum, M., Willett, P., Hanebeck, U.D.: On Wasserstein Barycenters and MMOSPA estimation. IEEE Signal Process. Lett. 22(10), 1511–1515 (2015)CrossRefGoogle Scholar
  21. 21.
    Gramfort, A., Peyré, G., Cuturi, M.: Fast optimal transport averaging of neuroimaging data. In: Ourselin, S., Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 261–272. Springer, Cham (2015). Scholar
  22. 22.
    Ye, J., Wu, P., Wang, J.Z., Li, J.: Fast discrete distribution clustering using Wasserstein Barycenter with sparse support. IEEE Trans. Signal Process. 65(9), 2317–2332 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), 1111–1138 (2015)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Ding, H., Berezney, R., Xu, J.: k-prototype learning for 3d rigid structures. In: Advances in Neural Information Processing Systems, pp. 2589–2597 (2013)Google Scholar
  25. 25.
    Ding, H., Xu, J.: Finding median point-set using earth mover’s distance. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014)Google Scholar
  26. 26.
    Staib, M., Claici, S., Solomon, J., Jegelka, S.: Parallel streaming Wasserstein Barycenters. In: Advances in Neural Information Processing Systems, pp. 2647–2658 (2017)Google Scholar
  27. 27.
    Phillips, J.M.: Coresets and sketches. Comput. Res. Repos. (2016)Google Scholar
  28. 28.
    Agarwal, P.K., Har-Peled, S., Varadarajan, K.R.: Geometric approximation via coresets. Comb. Comput. Geom. 52, 1–30 (2005)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180 (2003)Google Scholar
  30. 30.
    Typke, R., Veltkamp, R.C., Wiering, F.: Searching notated polyphonic music using transportation distances. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 128–135 (2004)Google Scholar
  31. 31.
    Gao, L., Su, L., Yang, Y.H., Tan, L.: Polyphonic piano note transcription with non-negative matrix factorization of differential spectrogram. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 291–295 (2017)Google Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2020

Authors and Affiliations

  • Cong Jin
    • 1
  • Zhongtong Li
    • 1
  • Yuanyuan Sun
    • 1
  • Haiyin Zhang
    • 2
  • Xin Lv
    • 3
    Email author
  • Jianguang Li
    • 4
  • Shouxun Liu
    • 4
  1. 1.School of Information and Communication EngineeringCommunication University of ChinaBeijingChina
  2. 2.School of Computer and Cyberspace SecurityCommunication University of ChinaBeijingChina
  3. 3.School of Animation and Digital ArtsCommunication University of ChinaBeijingChina
  4. 4.Communication University of ChinaBeijingChina

Personalised recommendations