Skip to main content
Log in

Pitch-Dependent Identification of Musical Instrument Sounds

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper describes a musical instrument identification method that takes into consideration the pitch dependency of timbres of musical instruments. The difficulty in musical instrument identification resides in the pitch dependency of musical instrument sounds, that is, acoustic features of most musical instruments vary according to the pitch (fundamental frequency, F0). To cope with this difficulty, we propose an F0-dependent multivariate normal distribution, where each element of the mean vector is represented by a function of F0. Our method first extracts 129 features (e.g., the spectral centroid, the gradient of the straight line approximating the power envelope) from a musical instrument sound and then reduces the dimensionality of the feature space into 18 dimension. In the 18-dimensional feature space, it calculates an F0-dependent mean function and an F0-normalized covariance, and finally applies the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments shows that the proposed method improved the recognition rate from 75.73% to 79.73%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • J.C. Brown, “Computer identification of musical instruments using pattern recognition with cepstral coefficients as features,” Journal of Acoustic Society of America vol. 103, no. 3, pp. 1933–1941, 1999.

    Google Scholar 

  • A. Eronen and A. Klapuri, “Musical instrument recognition using cepstral coefficients and temporal features,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing, IEEE, 2000, pp. 753–756.

  • I. Fujinaga and K. MacMillan, “Realtime recognition of orchestral instruments,” in Proceedings of International Computer Music Conference, 2000, pp. 141–143.

  • K. Kashino, K. Nakadai, T. Kinoshita, and H. Tanaka, “Application of the bayesian probability network to music scene analysis,” in Computational Auditory Scene Analysis, edited by D. Rosenthal and H.~G. Okuno, Eds., Lawrence Erlbaum Associates, 1998, pp. 115–137.

  • K.D. Martin, “Sound-Source Recognition: A Theory and Computational Model,” Ph.D. Thesis, MIT, 1999.

  • K. Kashino and H. Murase, “A sound source identification system for ensemble music based on template adaptation and music stream extraction,” Speech Communication, vol. 27, nos. 3–4, pp. 337–349, 1999.

    Google Scholar 

  • M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, “RWC music database: Music genre database and musical instrument sound database,” in Proceedings of International Conference on Music Information Retrieval, 2003, pp. 229–230.

  • D. Rosenthal and H.G. Okuno, eds. Computational Auditory Scene Analysis, Lawrence Erlbaum Associates, Mahwah, New Jersey, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tetsuro Kitahara.

Additional information

This research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Grant-in-Aid for Scientific Research (A), No.15200015, and Informatics Research Center for Development of Knowledge Society Infrastructure (COE program of MEXT, Japan).

Tetsuro Kitahara received the B.S. from Tokyo University of Science in 2002 and the M.S. from Kyoto University in 2004. He is currently a Ph.D. course student at Graduate School of Informatics, Kyoto University. Since 2005, he has been a Research Fellow of the Japan Society for the Promotion of Science. His research interests include music informatics. He recieved IPSJ 65th National Convention Student Award in 2003, IPSJ 66th National Convention Student Award and TELECOM System Technology Award for Student in 2004, and IPSJ 67th National Convention Best Paper Award for Young Researcher in 2005. He is a student member of IPSJ, IEICE, JSAI, ASJ, and JSMPC.

Masataka Goto received his Doctor of Engineering degree in Electronics, Information and Communication Engineering from Waseda University, Japan, in 1998. He then joined the Electrotechnical Laboratory (ETL; reorganized as the National Institute of Advanced Industrial Science and Technology (AIST) in 2001), where he has been engaged as a researcher ever since. He served concurrently as a researcher in Precursory Research for Embryonic Science and Technology (PRESTO), Japan Science and Technology Corporation (JST) from 2000 to 2003, and an associate professor of the Department of Intelligent Interaction Technologies, Graduate School of Systems and Information Engineering, University of Tsukuba since 2005. His research interests include music information processing and spoken language processing. Dr. Goto received seventeen awards including the IPSJ Best Paper Award and IPSJ Yamashita SIG Research Awards (MUS and SLP) from the Information Processing Society of Japan (IPSJ), Awaya Prize for Outstanding Presentation and Award for Outstanding Poster Presentation from the Acoustical Society of Japan (ASJ), Award for Best Presentation from the Japanese Society for Music Perception and Cognition (JSMPC), WISS 2000 Best Paper Award and Best Presentation Award, and Interaction 2003 Best Paper Award. He is a member of the IPSJ, ASJ, JSMPC, Institute of Electronics, Information and Communication Engineers (IEICE), and International Speech Communication Association (ISCA).

Hiroshi G. Okuno received the B.A. and Ph.D from the University of Tokyo in 1972 and 1996, respectively. He worked for Nippon Telegraph and Telephone, Kitano Symbiotic Systems Project, and Tokyo University of Science. He is currently a professor at the Department of Intelligence Technology and Science, Graduate School of Informatics, Kyoto University. He was a visiting scholar at Stanford University, and a visiting associate professor at the University of Tokyo. He has done research in programming languages, parallel processing, and reasoning mechanism in AI, and he is currently engaged in computational auditory scene analysis, music scene analysis and robot audition. He received the best paper awards from the Japanese Society for Artificial Intelligence and the International Society for Applied Intelligence, in 1991 and 2001, respectively. He edited with David Rosenthal “Computational Auditory Scene Analysis” from Lawrence Erlbaum Associates in 1998 and with Taiichi Yuasa “Advanced Lisp Technology” from Taylor and Francis Inc. in 2002. He is a member of IPSJ, JSAI, JSSST, JSCS, ACM, AAAI, ASA, and IEEE.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitahara, T., Goto, M. & Okuno, H.G. Pitch-Dependent Identification of Musical Instrument Sounds. Appl Intell 23, 267–275 (2005). https://doi.org/10.1007/s10489-005-4612-1

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-005-4612-1

Keywords

Navigation