Skip to main content
Log in

Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs)

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Machine learning provides researchers in speech processing and bioacoustics numerous advanced and non-invasive techniques to investigate animal vocalizations. Hidden Markov Models (HMMs) are machine learning techniques that were developed and implemented for the automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using traditional spectral and temporal features, namely Mel-Frequency Cepstral Coefficients (MFCCs) and delta (velocity) and delta-delta (acceleration) coefficients. By extracting the combined features from the frames of the vocalizations using 4 ms frame size and 2 ms step size and 4 state, left-to-right HMMs, the important tasks of gender recognition and speaker identification were performed on the database of 7285 coo call-types from 8 animals (4 males, 4 females). The task of gender recognition produced a 84.45% accuracy (1233/1460 correct recognitions), and the task of speaker identification of the 4 males and 4 males yielded 91.08% (633/695 correct identifications, males) and 83.27% (637/765 correct identifications, females) and 81.85% (119/1460 correct identifications) for all 8 animals. Based on the performance, the novel contributions of the framework—applying HMMs to the gender recognition and speaker identification of the Rhesus Macaques (M. mulatta) in an automated manner—could easily be extended to other mammals for automatic classification and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

N/A.

References

  • Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probability functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.

    Article  Google Scholar 

  • Bluemel, J., Korte, S., Schenck, E., & Weinbauer, G. (2015). The nonhuman primate in nonclinical drug development and safety assessment. Academic Press.

    Google Scholar 

  • Breed, M., & Moore, J. (2010). Encyclopedia of animal behavior. Academic Press.

    Google Scholar 

  • Brown, C., & Riede, T. (2017). Comparative bioacoustics: An overview. Bentham Science Publishers.

    Book  Google Scholar 

  • Clemins, P. J. (2005). Automatic classification of animal vocalizations. Marquette University.

    Google Scholar 

  • Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.

    Article  Google Scholar 

  • Forney, G. (1973). The Viterbi algorithm. Proceedings of IEEE, 61(3), 268–278.

    Article  MathSciNet  Google Scholar 

  • Fukushima, M., Doyle, A., Mullarkey, M., Mishkin, M., & Averbeck, B. (2015). Distributed acoustic cues for caller identity in Macaque vocalization. Royal Society of Open Science, 2(12), 1–12.

    Article  Google Scholar 

  • Hauser, M. (1998). Functional referents and acoustic similarity field playback experiments with Rhesus Monkeys. Animal Behaviour, 55(6), 1647–1658.

    Article  Google Scholar 

  • Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing. Prentice-Hall.

    Google Scholar 

  • Li, X., Tao, J., Johnson, M., Soltis, J., Savage, A. L. K., & Newman, J. (2007). Stress and emotion classification using Jitter and Shimmer features. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), Honolulu.

  • Lindburg, D. (1980). The macaques: Studies in ecology, behavior, and evolution. Van Nostrand Reinhold Company.

    Google Scholar 

  • Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47–60.

    Article  Google Scholar 

  • Rabiner, L., & Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4–16.

    Article  Google Scholar 

  • Ren, Y., Johnson, M. T., Clemins, P. J., Darre, M., Glaeser, S. S., Osiejuk, T. S., & Out-Nyarko, E. (2009). A framework for bioacoustic vocalization analysis using hidden Markov models. Algorithms, 2(4), 1410–1428.

    Article  Google Scholar 

  • Rendall, D., Owren, M., & Rodman, P. (1998). The role of vocal tract filtering in identity cueing in Rhesus Monkey (Macaca mulatta) vocalizations. The Journal of the Acoustical Society of America, 103(1), 602–614.

    Article  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 111–147.

    Article  MathSciNet  Google Scholar 

  • Von Bekesy, G. (1989). Experiments in hearing. McGraw-Hill Book Company.

    Google Scholar 

  • Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2009). Hidden Markov model toolkit (HTK) (Version 3.4). Cambridge University Engineering Department.

    Google Scholar 

Download references

Funding

N/A.

Author information

Authors and Affiliations

Authors

Contributions

Author was the sole contributor to the research work.

Corresponding author

Correspondence to Marek B. Trawicki.

Ethics declarations

Competing interests

The authors declared that they have no conflict of interest.

Ethical approval

Author maintained the highest level of integrity in the research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Trawicki, M.B. Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs). Int J Speech Technol 27, 179–186 (2024). https://doi.org/10.1007/s10772-024-10090-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-024-10090-z

Keywords

Navigation