Separating Voices in Polyphonic Music: A Contig Mapping Approach

  • Elaine Chew
  • Xiaodan Wu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3310)


Voice separation is a critical component of music information retrieval, music analysis and automated transcription systems. We present a contig mapping approach to voice separation based on perceptual principles. The algorithm runs in O(n2) time, uses only pitch height and event boundaries, and requires no user-defined parameters. The method segments a piece into contigs according to voice count, then reconnects fragments in adjacent contigs using a shortest distance strategy. The order of connection is by distance from maximal voice contigs, where the voice ordering is known. This contig-mapping algorithm has been implemented in VoSA, a Java-based voice separation analyzer software. The algorithm performed well when applied to J. S. Bach’s Two- and Three-Part Inventions and the forty-eight Fugues from the Well-Tempered Clavier. We report an overall average fragment consistency of 99.75%, correct fragment connection rate of 94.50% and average voice consistency of 88.98%, metrics which we propose to measure voice separation performance.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bregman, A.: Auditory Scene Analysis: The Perceptual Organization of Sound, pp. 417–442. The MIT Press, Cambridge (1990)Google Scholar
  2. 2.
    Cambouropoulos, E.: From MIDI to Traditional Musical Notation. In: Proceedings of the AAAI Workshop on Artificial Intelligence and Music: Towards Formal Models for Composition, Performance and Analysis, Austin, Texas, July 30 - August 3 (2000)Google Scholar
  3. 3.
    Cambouropoulos, E.: Pitch Spelling: A Computational Model. Music Perception 20(4), 411–429 (2003)CrossRefGoogle Scholar
  4. 4.
    Chew, E., Chen, Y.-C.: Determining Context-Defining Windows: Pitch Spelling Using the Spiral Array. In: Proceedings of the 4th International Conference on Music Information Retrieval (2003)Google Scholar
  5. 5.
    Deutsch, D.: Two-channel Listening to Musical Scales. Journal of the Acoustical Society of America 57, 1156–1160 (1975)CrossRefGoogle Scholar
  6. 6.
    Goebl, W.: Melody Lead in Piano Performance: Expressive Device or Artifact? Journal of the Acoustical Society of America 110(1), 563–572 (2001)CrossRefGoogle Scholar
  7. 7.
    Huron, D.: Tone and Voice: A Derivation of the Rules of Voice-leading from Perceptual Principles. Music Perception 19(1), 1–64 (2001)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Kilian, J., Hoos, H.: Voice Separation - A Local Optimization Approach. In: Proceedings of the 3rd International Conference on Music Information Retrieval, pp. 39–46 (2002)Google Scholar
  9. 9.
    Lemström, K., Tarhio, J.: Detecting monophonic patterns within polyphonic sources. In: Content-Based Multimedia Information Access Conference Proceedings (RIAO 2000), Paris, pp. 1251–1279 (2000)Google Scholar
  10. 10.
    Meredith, D.: Pitch Spelling Algorithms. In: Proceedings of the Fifth Triennial ESCOM Conference. Hanover University of Music and Drama, Germany, pp. 204–207 (2003)Google Scholar
  11. 11.
    Temperley, D.: The Cognition of Basic Musical Structures, pp. 85–114. The MIT Press, Cambridge Massachusetts (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Elaine Chew
    • 1
  • Xiaodan Wu
    • 1
  1. 1.Epstein Department of Industrial and Systems EngineeringUniversity of Southern California, Viterbi School of Engineering, Integrated Media Systems CenterLos AngelesUSA

Personalised recommendations