Improving Speaker Gender Detection by Combining Pitch and SDC

Mohanty, Aniruddha; Cherukuri, Ravindranath C.

doi:10.1007/978-981-99-7862-5_34

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 818))

Included in the following conference series:

International Conference on Data Science and Applications

115 Accesses

Abstract

Gender detection is helpful in various applications, such as speaker and emotion recognition, which helps with online learning, telecom caller identification, etc. This process is also used in speech analysis and initiating human-machine interaction. Gender detection is a complex process but an essential part of the digital world dealing with voice. The proposed approach is to detect gender from a speech by combining acoustic features like shifted delta cepstral (SDC) and pitch. The first step is preprocessing the speech sample to retrieve valid speech data. The second step is to calculate the pitch and SDC for each frame. The multifeature fusion method combines the speech features, and the XGBoost model is applied to detect gender. This approach results in accuracy rates of 99.44 and 99.37% with the help of RAVDESS and TIMIT datasets compared to the pre-defined methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abakarim, F., Abenaou, A. (2022). Voice gender recognition using acoustic features, MFCCs and SVM. In Computational science and its applications–ICCSA 2022 (pp. 634–648). Springer, Malaga.
Google Scholar
Abdulsatar, A. A., Davydov, V. V., Yushkova, V. V., Glinushkin, A. P., Rud, V. Y. (2019). Age and gender recognition from speech signals. Journal of Physics: Conference Series, 1410(1), 012073.
Google Scholar
Bořil, H., Horn, S. (2022). GAN-based augmentation for gender classification from speech spectrograms. In 2022 International conference on electrical, computer and energy technologies (ICECET) (pp. 1–6) IEEE, Prague.
Google Scholar
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, New York.
Google Scholar
Doukhan, D., Carrive, J., Vallet, F., Larcher, A., Meignier, S. (2018). An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5214–5218). IEEE, Albert.
Google Scholar
Ghosh, S., Saha, C., Molakathaala, N. (2020). Neuragen-a low-resource neural network based approach for gender classification. arXiv:2203.15253.
Gumina, S., Polizzotti, G., Spagnoli, A., Carbone, S., & Candela, V. (2022). Critical shoulder angle (CSA): age and gender distribution in the general population. Journal of Orthopaedics and Traumatology, 23(1), 10.
Google Scholar
Kannapiran, P., Sindha, M. M. R. (2023). Voice-based gender recognition model using FRT and light GBM. Tehnički vjesnik, 30(1), 282–291.
Google Scholar
Kone, V. S., Anagal, A., Anegundi, S., Jadhav, P., Kulkarni, U., & Meena, S. M.(2023). Voice-based gender and age recognition system. In 2023 International conference on advancement in computation & computer technologies (InCACCT) (pp. 74–80). IEEE, Mohali.
Google Scholar
Lebourdais, M., Tahon, M., Laurent, A., Meignier, S. (2022). Overlapped speech and gender detection with WavLM pre-trained features. arXiv:2209.04167.
Levitan, S. I., Mishra, T., Bangalore, S. (2016). Automatic identification of gender from speech. In Proceeding of speech prosody (pp. 84–88). Semantic Scholar, Boston.
Google Scholar
Li, Aini, Lai, Wei, & Kuang, Jianjing. (2022). How do listeners identify creak? The effects of pitch range, prosodic position and creak locality in Mandarin. Proceedings of Speech Prosody, 2022, 480–484.
Google Scholar
Liztio, L. M., Sari, C. A., Rachmawanto, E. H., et al. (2020). Gender identification based on speech recognition using backpropagation neural network.. In 2020 International seminar on application for technology of information and communication (iSemantic) (pp. 88–92). IEEE, Semarang.
Google Scholar
Mohanty, A., Cherukuri, R. C., Prusty, A. R. (2022). Improvement of speech emotion recognition by deep convolutional neural network and speech features. In Congress on intelligent systems (pp. 117–129). Springer, Bengaluru.
Google Scholar
Munoli, B. K., Jain, K. A. K., Kumar, P., PS, A. R., et al. (2023). Human voice analysis to determine age and gender. In 2023 International conference on recent trends in electronics and communication (ICRTEC) (pp. 1–4). IEEE, Mysuru.
Google Scholar
Priya, E., Reshma, P. S., Sashaank, S., et al. (2022). Temporal and spectral features based gender recognition from audio signals. In 2022 international conference on communication, computing and internet of things (IC3IoT) (pp. 1–5). IEEE, Chennai.
Google Scholar
Safara, F., Mohammed, A. S., Potrus, M. Y., Ali, S., Tho, Q. T., Souri, A., Janenia, F., & Hosseinzadeh, M. (2020). An author gender detection method using whale optimization algorithm and artificial neural network. IEEE Access, 8, 48428–48437.
Google Scholar
Sánchez-Hevia, H. A., Gil-Pita, R., Utrilla-Manso, M., Rosa-Zurera, M. (2022). Age group classification and gender recognition from speech with temporal convolutional neural networks. Multimedia Tools and Applications, 81(3), 3535–3552.
Google Scholar
Sandhya, P., Spoorthy, V., Koolagudi, S. G., & Sobhana, N. V. (2020). Spectral features for emotional speaker recognition. In 2020 Third international conference on advances in electronics, computers and communications (ICAECC) (pp. 1–6). IEEE, Bengaluru.
Google Scholar
Sefara, T. J., & Modupe, A. (2019). Yorùbá gender recognition from speech using neural networks. In 2019 6th International conference on soft computing & machine intelligence (ISCMI) (pp. 50–55). IEEE, Biarritz.
Google Scholar
Ting, H., Yingchun, Y., Zhaohui, W. (2006). Combining MFCC and pitch to enhance the performance of the gender recognition. In 2006 8th international conference on signal processing (1). IEEE, Guilin.
Google Scholar
Uddin, M. A., Hossain, M. S., Pathan, R. K., & Biswas, M. (2020). Gender recognition from human voice using multi-layer architecture. In 2020 International conference on innovations in intelligent systems and applications (INISTA) (pp. 1–7). IEEE, Biarritz.
Google Scholar
van Bemmel, L., Liu, Z., Vaessen, N., Larson, M. (2023). Beyond neural-on-neural approaches to speaker gender protection. In ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1–5). IEEE, Rhodes.
Google Scholar
Zaman, S. R., Sadekeen, D., Alfaz, M. A., Shahriyar, R. (2021). One source to detect them all: gender, age, and emotion detection from voice. In 2021 IEEE 45th annual computers, software, and applications conference (COMPSAC) (pp. 338–343). IEEE, Madrid.
Google Scholar
Zhang, S., Li, C. (2022). Research on feature fusion speech emotion recognition technology for smart teaching. Mobile Information Systems.
Google Scholar

Download references

Author information

Authors and Affiliations

CHRIST (Deemed to be University), Bangalore, Karnataka, India
Aniruddha Mohanty & Ravindranath C. Cherukuri

Authors

Aniruddha Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
Ravindranath C. Cherukuri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aniruddha Mohanty .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India
Satyasai Jagannath Nanda
Department of Electronics and Communication Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India
Rajendra Prasad Yadav
Department of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia
Amir H. Gandomi
Department of Computer Science and Engineering and Information Technology, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
Mukesh Saraswat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohanty, A., Cherukuri, R.C. (2024). Improving Speaker Gender Detection by Combining Pitch and SDC. In: Nanda, S.J., Yadav, R.P., Gandomi, A.H., Saraswat, M. (eds) Data Science and Applications. ICDSA 2023. Lecture Notes in Networks and Systems, vol 818. Springer, Singapore. https://doi.org/10.1007/978-981-99-7862-5_34

Download citation

DOI: https://doi.org/10.1007/978-981-99-7862-5_34
Published: 16 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7861-8
Online ISBN: 978-981-99-7862-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics