Abstract
Machine learning provides researchers in speech processing and bioacoustics numerous advanced and non-invasive techniques to investigate animal vocalizations. Hidden Markov Models (HMMs) are machine learning techniques that were developed and implemented for the automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using traditional spectral and temporal features, namely Mel-Frequency Cepstral Coefficients (MFCCs) and delta (velocity) and delta-delta (acceleration) coefficients. By extracting the combined features from the frames of the vocalizations using 4 ms frame size and 2 ms step size and 4 state, left-to-right HMMs, the important tasks of gender recognition and speaker identification were performed on the database of 7285 coo call-types from 8 animals (4 males, 4 females). The task of gender recognition produced a 84.45% accuracy (1233/1460 correct recognitions), and the task of speaker identification of the 4 males and 4 males yielded 91.08% (633/695 correct identifications, males) and 83.27% (637/765 correct identifications, females) and 81.85% (119/1460 correct identifications) for all 8 animals. Based on the performance, the novel contributions of the framework—applying HMMs to the gender recognition and speaker identification of the Rhesus Macaques (M. mulatta) in an automated manner—could easily be extended to other mammals for automatic classification and recognition.
Similar content being viewed by others
Data availability
N/A.
References
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probability functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.
Bluemel, J., Korte, S., Schenck, E., & Weinbauer, G. (2015). The nonhuman primate in nonclinical drug development and safety assessment. Academic Press.
Breed, M., & Moore, J. (2010). Encyclopedia of animal behavior. Academic Press.
Brown, C., & Riede, T. (2017). Comparative bioacoustics: An overview. Bentham Science Publishers.
Clemins, P. J. (2005). Automatic classification of animal vocalizations. Marquette University.
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Forney, G. (1973). The Viterbi algorithm. Proceedings of IEEE, 61(3), 268–278.
Fukushima, M., Doyle, A., Mullarkey, M., Mishkin, M., & Averbeck, B. (2015). Distributed acoustic cues for caller identity in Macaque vocalization. Royal Society of Open Science, 2(12), 1–12.
Hauser, M. (1998). Functional referents and acoustic similarity field playback experiments with Rhesus Monkeys. Animal Behaviour, 55(6), 1647–1658.
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing. Prentice-Hall.
Li, X., Tao, J., Johnson, M., Soltis, J., Savage, A. L. K., & Newman, J. (2007). Stress and emotion classification using Jitter and Shimmer features. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), Honolulu.
Lindburg, D. (1980). The macaques: Studies in ecology, behavior, and evolution. Van Nostrand Reinhold Company.
Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47–60.
Rabiner, L., & Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4–16.
Ren, Y., Johnson, M. T., Clemins, P. J., Darre, M., Glaeser, S. S., Osiejuk, T. S., & Out-Nyarko, E. (2009). A framework for bioacoustic vocalization analysis using hidden Markov models. Algorithms, 2(4), 1410–1428.
Rendall, D., Owren, M., & Rodman, P. (1998). The role of vocal tract filtering in identity cueing in Rhesus Monkey (Macaca mulatta) vocalizations. The Journal of the Acoustical Society of America, 103(1), 602–614.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 111–147.
Von Bekesy, G. (1989). Experiments in hearing. McGraw-Hill Book Company.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2009). Hidden Markov model toolkit (HTK) (Version 3.4). Cambridge University Engineering Department.
Funding
N/A.
Author information
Authors and Affiliations
Contributions
Author was the sole contributor to the research work.
Corresponding author
Ethics declarations
Competing interests
The authors declared that they have no conflict of interest.
Ethical approval
Author maintained the highest level of integrity in the research work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Trawicki, M.B. Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs). Int J Speech Technol 27, 179–186 (2024). https://doi.org/10.1007/s10772-024-10090-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-024-10090-z