Abstract
The goal of this proposed work is to design a gender identification system that identifies the gender of the speaker. Gender classification is an emerging area of research for the accomplishment of efficient interaction between human and machine using speech files. Numerous ways have been proposed for the gender classification in the past. Speech recognition serves as a prime approach for the identification of the source. Other means for the gender classification includes gait of person, lips shape, facial recognition, iris code, etc. In this paper, the gender has been classified for machine learning-based systems using speech files. These systems may be deployed for the critical investigations areas like crime scene. There are various challenges in this field of speech recognition like determining the multilingual segments added in the speech stream and the gender of the speaker. To resolve these problems and to identify the gender of the speaker many different algorithms are used like frequency estimation, matrix representation, Gaussian mixture models, pattern matching algorithm, hidden Markov model, vector quantization, decision trees and neural networks. In this work, we used machine learning methods of neural network and decision trees for the classification of gender that are explained further in literature review.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hong Z (2017) Speaker gender recognition system. Master’s thesis, degree programme in wireless communications engineering, University of Oulu
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589
Hermansky H, Morgan N, Hirsch H-G (2002) Recognition of speech in a additive and convolutional noise based on RASTA spectral processing, vol 2, pp 83–86
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol 1(Vi):1–5
HaCohen-Kerner Y, Hagege R (2017) Language and gender classification of speech files using supervised machine learning methods. Cybern Syst 48(6–7):510–535
HaCohen-Kerner Y, Hagege R (2015) Automatic classification of spoken languages using diverse acoustic features, pp 275–285
Heryanto H, Akbar S, Sitohang B (2014) A new direct access framework for speaker identification system. In: Proceedings of 2014 international conference on data and software engineering (ICODSE) 2014
Djemili R (2012) A speech signal based gender identification system using four classifiers
Chowdhury SA, Stepanov EA, Danieli M, Riccardi G (2019) Automatic classification of speech overlaps: feature representation and algorithms. Comput Speech Lang 55:145–167
Errattahi R, EL Hannani A, Hain T, Ouahmane H (2019) System-independent ASR error detection and classification using Recurrent Neural Network. Comput Speech Lang 55:187–199
Tapia JE, Perez CA, Bowyer KW (2016) Gender classification from the same iris code used for recognition. IEEE Trans Inf Forensics Secur 6013(c):1–11
Cheng J, Li Y, Wang J, Yu L, Wang S (2019) Exploiting effective facial patches for robust gender recognition. Tsinghua Sci Technol 24(3):333–345
Federmann C, Lewis WD (2016) Microsoft speech language translation (MSLT) corpus: the IWSLT 2016 release for English, French and German
Alam SMS, Khan S (2014) Response of different window methods in speech recognition by using dynamic programming. In: 1st international conference on electrical engineering and information & communication technology ICEEICT 2014, no 2
Zaw TH, War N (2018) The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection. In: 20th international conference of computer and information technology ICCIT 2017, vol 2018, Jan 2018, pp 1–5
Ho TK (1995) Random decision forests, vol 47, pp 4–8
Breiman L (1994) Bagging predictors. Report no. 421. Department of Statistics, University of California, Berkeley
Platt JC (1998) Sequential minimal optimization: a fast algorithm for training support vector machines, pp 1–21
Rumelhart DE et al (1985) Learning internal representations by error propagation. Report no. V. Institute for Cognitive Science, University of California, San Diego, La Jolla, California
Sim KC, Lee K (2010) Adaptive score fusion using weighted logistic linear regression for spoken language recognition. In: Sim KC, Lee K-A (eds) Agency for science, technology and research (A STAR). 2010 IEEE international conference on acoustics, speech and signal processing, Singapore, pp 5018–5021
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Yadav, M., Verma, V.K., Yadav, C.S., Verma, J.K. (2020). MLPGI: Multilayer Perceptron-Based Gender Identification Over Voice Samples in Supervised Machine Learning. In: Johri, P., Verma, J., Paul, S. (eds) Applications of Machine Learning. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-3357-0_23
Download citation
DOI: https://doi.org/10.1007/978-981-15-3357-0_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3356-3
Online ISBN: 978-981-15-3357-0
eBook Packages: EngineeringEngineering (R0)