Polyphonic Transcription: Exploring a Hybrid of Tone Models and Particle Swarm Optimisation

  • Somnuk Phon-Amnuaisuk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7247)


Polyphonic transcription could be formulated as a supervised classification task if the classifiers of all possible polyphonic combinations could be learned beforehand. However, it is impractical to learn all possible classification models in real life due to the exponential explosion of all possible polyphonic combinations. Here, we describe a novel polyphonic transcription approach that applies a hybrid of the Particle Swarm Optimisation (PSO) and the Tone-model techniques. This hybrid approach exploits the strengths from both the heuristic-search and the model based approaches. In our work, only the monophonic Tone-models of all pitches are learned and employed to calculate the first pass output of polyphonic transcription, which is then refined in the second pass by PSO. The experimental results show that the proposed hybrid approach outperform the competing Non-negative Matrix Factorisation (NMF) approach. This paper presents and discusses the design and the experimental results of this novel approach.


Polyphonic music transcription Hybrid of Tone-models Particle swarm optimisation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Plumbley, M.D., Abdallah, S.A., Blumensath, T., Davies, M.E.: Sparse representations of polyphonic music. Signal Processing 86(3), 417–431 (2005)CrossRefGoogle Scholar
  2. 2.
    Bello, J.P.: Toward the automated analysis of simple polyphonic music: a knowledge-based approach. Ph.D. dissertation, Department of Electrical Engineering, Queen Mary, University of London, London, U.K. (2003)Google Scholar
  3. 3.
    Bregman, A.: Auditory Scence Analysis. MIT Press, Cambridge (1990)Google Scholar
  4. 4.
    Brown, G.J., Cooke, M.: Computational auditory scene analysis. Computer Speech and Language 8, 297–336 (1994)CrossRefGoogle Scholar
  5. 5.
    Brown, J.C., Puckette, M.S.: An efficient algorithm for the calculation of a constant Q transform. Journal of the Acoustical Society of America 92(5), 2698–2701 (1992)CrossRefGoogle Scholar
  6. 6.
    Davy, M., Godsill, S.J.: Bayesian Harmonic Models for Musical Signal Analysis. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics, vol. 7, pp. 105–124. Oxford University Press (2003)Google Scholar
  7. 7.
    Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micromachine and Human Science, Nagoya, Japan, pp. 39–43 (1995)Google Scholar
  8. 8.
    Ellis, D.P.W.: Model-based scene analysis. In: Wang, D., Brown, G.J. (eds.) Computational Auditory Scene Analysis: Principles, Algorithms and Applications. IEEE Press, A John Wiley & Sons, Inc. (2006)Google Scholar
  9. 9.
    Goto, M.: A real-time music-scence-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication 43, 311–329 (2004)CrossRefGoogle Scholar
  10. 10.
    Kashino, K., Nakadai, K., Kinoshita, T., Tanaka, H.: Application of Bayesian probability network to music scence analysis. In: Proceedings of IJCAI Workshop on CASA, Montreal, pp. 52–59 (1995)Google Scholar
  11. 11.
    Klapuri, A.: Automatic music transcription as we know it today. Journal of New Music Research 33(3), 269–282 (2004)CrossRefGoogle Scholar
  12. 12.
    Klapuri, A.: Signal processing methods for the automatic transcription of music. Ph.D thesis, Tampere University of Technology (2004)Google Scholar
  13. 13.
    Martin, K.D.: A blackboard system for automatic transcription of simple polyphonic music. M.I.T. Media Lab, Perceptual Computing, Technical Report. 385 (1996)Google Scholar
  14. 14.
    Niedermayer, B.: Non-negative matrix division for the automatic transcription of polyphonic music. In: Proceedings of International Conference on Music Information Retrieval (ISMIR 2008), Austria, pp. 545–549 (2008)Google Scholar
  15. 15.
    Phon-Amnuaisuk, S.: Transcribing Bach chorales using non-negative matrix factorisation. In: Proceedings of the 2010 International Conference on Information Technology Convergence on Audio, Language and Image Processing (ICALIP 2010), Shanghai China, pp. 688–693 (2010)Google Scholar
  16. 16.
    Smaragdis, P., Brown, J.C.: Non-negative matric factorization for polyphonic music transcription. In: Proceedings of IEEE Workshop Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, pp. 177–180 (2003)Google Scholar
  17. 17.
    Vincent, E., Rodet, X.: Music transcription with ISA and HMM In. In: Proceedings of the Fifth International Conference on Independent Component Analysis and Blind Signal Separation (ICA 2004), Gradana, Spain, pp. 1197–1204 (2004)Google Scholar
  18. 18.
    Walmsley, P.J., Godsill, S.J., Rayner, P.J.W.: Bayesian graphical models for polyphonic pitch tracking. In: Proceedings of Diderot Forum on Mathematics and Music, Vienna, Austria, December 2-4 (1999)Google Scholar
  19. 19.
    Wang, D., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms and Applications. IEEE Press, A John Wiley & Sons, Inc. (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Somnuk Phon-Amnuaisuk
    • 1
  1. 1.Music Informatics Research Group, Faculty of Creative IndustriesUniversiti Tunku Abdul RahmanMalaysia

Personalised recommendations