Abstract
Voice separation is a critical component of music information retrieval, music analysis and automated transcription systems. We present a contig mapping approach to voice separation based on perceptual principles. The algorithm runs in O(n 2) time, uses only pitch height and event boundaries, and requires no user-defined parameters. The method segments a piece into contigs according to voice count, then reconnects fragments in adjacent contigs using a shortest distance strategy. The order of connection is by distance from maximal voice contigs, where the voice ordering is known. This contig-mapping algorithm has been implemented in VoSA, a Java-based voice separation analyzer software. The algorithm performed well when applied to J. S. Bach’s Two- and Three-Part Inventions and the forty-eight Fugues from the Well-Tempered Clavier. We report an overall average fragment consistency of 99.75%, correct fragment connection rate of 94.50% and average voice consistency of 88.98%, metrics which we propose to measure voice separation performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bregman, A.: Auditory Scene Analysis: The Perceptual Organization of Sound, pp. 417–442. The MIT Press, Cambridge (1990)
Cambouropoulos, E.: From MIDI to Traditional Musical Notation. In: Proceedings of the AAAI Workshop on Artificial Intelligence and Music: Towards Formal Models for Composition, Performance and Analysis, Austin, Texas, July 30 - August 3 (2000)
Cambouropoulos, E.: Pitch Spelling: A Computational Model. Music Perception 20(4), 411–429 (2003)
Chew, E., Chen, Y.-C.: Determining Context-Defining Windows: Pitch Spelling Using the Spiral Array. In: Proceedings of the 4th International Conference on Music Information Retrieval (2003)
Deutsch, D.: Two-channel Listening to Musical Scales. Journal of the Acoustical Society of America 57, 1156–1160 (1975)
Goebl, W.: Melody Lead in Piano Performance: Expressive Device or Artifact? Journal of the Acoustical Society of America 110(1), 563–572 (2001)
Huron, D.: Tone and Voice: A Derivation of the Rules of Voice-leading from Perceptual Principles. Music Perception 19(1), 1–64 (2001)
Kilian, J., Hoos, H.: Voice Separation - A Local Optimization Approach. In: Proceedings of the 3rd International Conference on Music Information Retrieval, pp. 39–46 (2002)
Lemström, K., Tarhio, J.: Detecting monophonic patterns within polyphonic sources. In: Content-Based Multimedia Information Access Conference Proceedings (RIAO 2000), Paris, pp. 1251–1279 (2000)
Meredith, D.: Pitch Spelling Algorithms. In: Proceedings of the Fifth Triennial ESCOM Conference. Hanover University of Music and Drama, Germany, pp. 204–207 (2003)
Temperley, D.: The Cognition of Basic Musical Structures, pp. 85–114. The MIT Press, Cambridge Massachusetts (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chew, E., Wu, X. (2005). Separating Voices in Polyphonic Music: A Contig Mapping Approach. In: Wiil, U.K. (eds) Computer Music Modeling and Retrieval. CMMR 2004. Lecture Notes in Computer Science, vol 3310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31807-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-31807-1_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24458-5
Online ISBN: 978-3-540-31807-1
eBook Packages: Computer ScienceComputer Science (R0)