Abstract
Gender detection is helpful in various applications, such as speaker and emotion recognition, which helps with online learning, telecom caller identification, etc. This process is also used in speech analysis and initiating human-machine interaction. Gender detection is a complex process but an essential part of the digital world dealing with voice. The proposed approach is to detect gender from a speech by combining acoustic features like shifted delta cepstral (SDC) and pitch. The first step is preprocessing the speech sample to retrieve valid speech data. The second step is to calculate the pitch and SDC for each frame. The multifeature fusion method combines the speech features, and the XGBoost model is applied to detect gender. This approach results in accuracy rates of 99.44 and 99.37% with the help of RAVDESS and TIMIT datasets compared to the pre-defined methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abakarim, F., Abenaou, A. (2022). Voice gender recognition using acoustic features, MFCCs and SVM. In Computational science and its applications–ICCSA 2022 (pp. 634–648). Springer, Malaga.
Abdulsatar, A. A., Davydov, V. V., Yushkova, V. V., Glinushkin, A. P., Rud, V. Y. (2019). Age and gender recognition from speech signals. Journal of Physics: Conference Series, 1410(1), 012073.
Bořil, H., Horn, S. (2022). GAN-based augmentation for gender classification from speech spectrograms. In 2022 International conference on electrical, computer and energy technologies (ICECET) (pp. 1–6) IEEE, Prague.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, New York.
Doukhan, D., Carrive, J., Vallet, F., Larcher, A., Meignier, S. (2018). An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5214–5218). IEEE, Albert.
Ghosh, S., Saha, C., Molakathaala, N. (2020). Neuragen-a low-resource neural network based approach for gender classification. arXiv:2203.15253.
Gumina, S., Polizzotti, G., Spagnoli, A., Carbone, S., & Candela, V. (2022). Critical shoulder angle (CSA): age and gender distribution in the general population. Journal of Orthopaedics and Traumatology, 23(1), 10.
Kannapiran, P., Sindha, M. M. R. (2023). Voice-based gender recognition model using FRT and light GBM. Tehnički vjesnik, 30(1), 282–291.
Kone, V. S., Anagal, A., Anegundi, S., Jadhav, P., Kulkarni, U., & Meena, S. M.(2023). Voice-based gender and age recognition system. In 2023 International conference on advancement in computation & computer technologies (InCACCT) (pp. 74–80). IEEE, Mohali.
Lebourdais, M., Tahon, M., Laurent, A., Meignier, S. (2022). Overlapped speech and gender detection with WavLM pre-trained features. arXiv:2209.04167.
Levitan, S. I., Mishra, T., Bangalore, S. (2016). Automatic identification of gender from speech. In Proceeding of speech prosody (pp. 84–88). Semantic Scholar, Boston.
Li, Aini, Lai, Wei, & Kuang, Jianjing. (2022). How do listeners identify creak? The effects of pitch range, prosodic position and creak locality in Mandarin. Proceedings of Speech Prosody, 2022, 480–484.
Liztio, L. M., Sari, C. A., Rachmawanto, E. H., et al. (2020). Gender identification based on speech recognition using backpropagation neural network.. In 2020 International seminar on application for technology of information and communication (iSemantic) (pp. 88–92). IEEE, Semarang.
Mohanty, A., Cherukuri, R. C., Prusty, A. R. (2022). Improvement of speech emotion recognition by deep convolutional neural network and speech features. In Congress on intelligent systems (pp. 117–129). Springer, Bengaluru.
Munoli, B. K., Jain, K. A. K., Kumar, P., PS, A. R., et al. (2023). Human voice analysis to determine age and gender. In 2023 International conference on recent trends in electronics and communication (ICRTEC) (pp. 1–4). IEEE, Mysuru.
Priya, E., Reshma, P. S., Sashaank, S., et al. (2022). Temporal and spectral features based gender recognition from audio signals. In 2022 international conference on communication, computing and internet of things (IC3IoT) (pp. 1–5). IEEE, Chennai.
Safara, F., Mohammed, A. S., Potrus, M. Y., Ali, S., Tho, Q. T., Souri, A., Janenia, F., & Hosseinzadeh, M. (2020). An author gender detection method using whale optimization algorithm and artificial neural network. IEEE Access, 8, 48428–48437.
Sánchez-Hevia, H. A., Gil-Pita, R., Utrilla-Manso, M., Rosa-Zurera, M. (2022). Age group classification and gender recognition from speech with temporal convolutional neural networks. Multimedia Tools and Applications, 81(3), 3535–3552.
Sandhya, P., Spoorthy, V., Koolagudi, S. G., & Sobhana, N. V. (2020). Spectral features for emotional speaker recognition. In 2020 Third international conference on advances in electronics, computers and communications (ICAECC) (pp. 1–6). IEEE, Bengaluru.
Sefara, T. J., & Modupe, A. (2019). Yorùbá gender recognition from speech using neural networks. In 2019 6th International conference on soft computing & machine intelligence (ISCMI) (pp. 50–55). IEEE, Biarritz.
Ting, H., Yingchun, Y., Zhaohui, W. (2006). Combining MFCC and pitch to enhance the performance of the gender recognition. In 2006 8th international conference on signal processing (1). IEEE, Guilin.
Uddin, M. A., Hossain, M. S., Pathan, R. K., & Biswas, M. (2020). Gender recognition from human voice using multi-layer architecture. In 2020 International conference on innovations in intelligent systems and applications (INISTA) (pp. 1–7). IEEE, Biarritz.
van Bemmel, L., Liu, Z., Vaessen, N., Larson, M. (2023). Beyond neural-on-neural approaches to speaker gender protection. In ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1–5). IEEE, Rhodes.
Zaman, S. R., Sadekeen, D., Alfaz, M. A., Shahriyar, R. (2021). One source to detect them all: gender, age, and emotion detection from voice. In 2021 IEEE 45th annual computers, software, and applications conference (COMPSAC) (pp. 338–343). IEEE, Madrid.
Zhang, S., Li, C. (2022). Research on feature fusion speech emotion recognition technology for smart teaching. Mobile Information Systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mohanty, A., Cherukuri, R.C. (2024). Improving Speaker Gender Detection by Combining Pitch and SDC. In: Nanda, S.J., Yadav, R.P., Gandomi, A.H., Saraswat, M. (eds) Data Science and Applications. ICDSA 2023. Lecture Notes in Networks and Systems, vol 818. Springer, Singapore. https://doi.org/10.1007/978-981-99-7862-5_34
Download citation
DOI: https://doi.org/10.1007/978-981-99-7862-5_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7861-8
Online ISBN: 978-981-99-7862-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)