Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs)

Trawicki, Marek B.

doi:10.1007/s10772-024-10090-z

Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs)

Published: 14 March 2024

Volume 27, pages 179–186, (2024)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Marek B. Trawicki ORCID: orcid.org/0000-0002-5784-5632¹

53 Accesses
Explore all metrics

Abstract

Machine learning provides researchers in speech processing and bioacoustics numerous advanced and non-invasive techniques to investigate animal vocalizations. Hidden Markov Models (HMMs) are machine learning techniques that were developed and implemented for the automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using traditional spectral and temporal features, namely Mel-Frequency Cepstral Coefficients (MFCCs) and delta (velocity) and delta-delta (acceleration) coefficients. By extracting the combined features from the frames of the vocalizations using 4 ms frame size and 2 ms step size and 4 state, left-to-right HMMs, the important tasks of gender recognition and speaker identification were performed on the database of 7285 coo call-types from 8 animals (4 males, 4 females). The task of gender recognition produced a 84.45% accuracy (1233/1460 correct recognitions), and the task of speaker identification of the 4 males and 4 males yielded 91.08% (633/695 correct identifications, males) and 83.27% (637/765 correct identifications, females) and 81.85% (119/1460 correct identifications) for all 8 animals. Based on the performance, the novel contributions of the framework—applying HMMs to the gender recognition and speaker identification of the Rhesus Macaques (M. mulatta) in an automated manner—could easily be extended to other mammals for automatic classification and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Age group classification and gender recognition from speech with temporal convolutional neural networks

Article Open access 13 January 2022

Parkinson disease prediction using machine learning-based features from speech signal

Article 27 June 2023

Data availability

N/A.

References

Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probability functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164–171.
Article Google Scholar
Bluemel, J., Korte, S., Schenck, E., & Weinbauer, G. (2015). The nonhuman primate in nonclinical drug development and safety assessment. Academic Press.
Google Scholar
Breed, M., & Moore, J. (2010). Encyclopedia of animal behavior. Academic Press.
Google Scholar
Brown, C., & Riede, T. (2017). Comparative bioacoustics: An overview. Bentham Science Publishers.
Book Google Scholar
Clemins, P. J. (2005). Automatic classification of animal vocalizations. Marquette University.
Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.
Article Google Scholar
Forney, G. (1973). The Viterbi algorithm. Proceedings of IEEE, 61(3), 268–278.
Article MathSciNet Google Scholar
Fukushima, M., Doyle, A., Mullarkey, M., Mishkin, M., & Averbeck, B. (2015). Distributed acoustic cues for caller identity in Macaque vocalization. Royal Society of Open Science, 2(12), 1–12.
Article Google Scholar
Hauser, M. (1998). Functional referents and acoustic similarity field playback experiments with Rhesus Monkeys. Animal Behaviour, 55(6), 1647–1658.
Article Google Scholar
Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing. Prentice-Hall.
Google Scholar
Li, X., Tao, J., Johnson, M., Soltis, J., Savage, A. L. K., & Newman, J. (2007). Stress and emotion classification using Jitter and Shimmer features. In IEEE international conference on acoustics, speech, and signal processing (ICASSP), Honolulu.
Lindburg, D. (1980). The macaques: Studies in ecology, behavior, and evolution. Van Nostrand Reinhold Company.
Google Scholar
Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6), 47–60.
Article Google Scholar
Rabiner, L., & Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4–16.
Article Google Scholar
Ren, Y., Johnson, M. T., Clemins, P. J., Darre, M., Glaeser, S. S., Osiejuk, T. S., & Out-Nyarko, E. (2009). A framework for bioacoustic vocalization analysis using hidden Markov models. Algorithms, 2(4), 1410–1428.
Article Google Scholar
Rendall, D., Owren, M., & Rodman, P. (1998). The role of vocal tract filtering in identity cueing in Rhesus Monkey (Macaca mulatta) vocalizations. The Journal of the Acoustical Society of America, 103(1), 602–614.
Article Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 111–147.
Article MathSciNet Google Scholar
Von Bekesy, G. (1989). Experiments in hearing. McGraw-Hill Book Company.
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2009). Hidden Markov model toolkit (HTK) (Version 3.4). Cambridge University Engineering Department.
Google Scholar

Download references

Funding

N/A.

Author information

Authors and Affiliations

Marquette University, 1313 W. Wisconsin Avenue, Milwaukee, WI, 53233, USA
Marek B. Trawicki

Authors

Marek B. Trawicki
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Author was the sole contributor to the research work.

Corresponding author

Correspondence to Marek B. Trawicki.

Ethics declarations

Competing interests

The authors declared that they have no conflict of interest.

Ethical approval

Author maintained the highest level of integrity in the research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Trawicki, M.B. Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs). Int J Speech Technol 27, 179–186 (2024). https://doi.org/10.1007/s10772-024-10090-z

Download citation

Received: 30 November 2023
Accepted: 13 February 2024
Published: 14 March 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10772-024-10090-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs)

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Age group classification and gender recognition from speech with temporal convolutional neural networks

Parkinson disease prediction using machine learning-based features from speech signal

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic gender recognition and speaker identification of Rhesus Macaques (Macaca mulatta) using hidden Markov models (HMMs)

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Age group classification and gender recognition from speech with temporal convolutional neural networks

Parkinson disease prediction using machine learning-based features from speech signal

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation