Abstract
Automatic speaker recognizing models consists of a foundation on building various models of speaker characterization, pattern analyzing and engineering. The effect of classification and feature selection methods for the speech emotion recognition is focused. The process of selecting the exact parameter in arrangement with the classifier is an important part of minimizing the difficulty of system computing. This process becomes essential particularly for the models which undergo deployment in real time scenario. In this paper, a new deep learning speech based recognition model is presented for automatically recognizes the speech words. The superiority of an input source, i.e. speech sound in this state has straight impact on a classifier correctness attaining process. The Berlin database consist around 500 demonstrations to media persons that is both male and female. On the applied dataset, the presented model achieves a maximum accuracy of 94.21%, 83.54%, 83.65% and 78.13% under MFCC, prosodic, LSP and LPC features. The presented model offered better recognition performance over the other methods.
Similar content being viewed by others
References
Cakır, E., Heittola, T., & Virtanen, T. (2016). Domestic audio tagging with convolutional neural networks. In IEEE AASP challenge on detection and classification of acoustic scenes and events (DCASE 2016), (pp. 1–2).
Cakır, E., Parascandolo, G., Heittola, T., Huttunen, H., & Virtanen, T. (2017). Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1291–1303.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1251–1258).
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Hershey, S., Chaudhuri, S., Ellis, D.P., Gemmeke, J. F., Jansen, A., Moore, R. C., Plakal, M., Platt, D., Saurous, R. A., & Seybold, B. et al. (2017). CNN architectures for largescale audio classification. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 131–135). IEEE.
Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117.
Krishnaraj, N., Elhoseny, M., Thenmozhi, M., Selim, M. M., & Shankar, K. (2019). Deep learning model for real-time image compression in Internet of Underwater Things (IoUT). Journal of Real-Time Image Processing. https://doi.org/10.1007/s11554-019-00879-6.
Lakshmanaprabu, S. K., Mohanty, S. N., Krishnamoorthy, S., Uthayakumar, J., & Shankar, K. (2019a). Online clinical decision support system using optimal deep neural networks. Applied Soft Computing, 81, 105487.
Lakshmanaprabu, S. K., Mohanty, S. N., Shankar, K., Arunkumar, N., & Ramirez, G. (2019b). Optimal deep learning model for classification of lung cancer on CT images. Future Generation Computer Systems, 92, 374–382.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.
Lydia, E., Moses, G., Sharmili, N., Shankar, K., & Maseleno, A. (2019). Image classification using deep neural networks for malaria disease detection. International Journal on Emerging Technologies, 10, 66–70.
Partila, P., Voznak, M., Mikulec, M., & Zdralek, J. (2012). Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state. Advances in Electrical and Electronic Engineering, 10(4), 270–275.
Shankar, K., Manickam, P., Devika, G., & Ilayaraja, M. (2018, December). Optimal feature selection for chronic kidney disease classification using deep learning classifier. In 2018 IEEE international conference on computational intelligence and computing research (ICCIC) (pp. 1–5). IEEE.
Voznak, M., Rezac, F., & Rozhon, J. (2010). Speech quality monitoring in Czech national research network. Advances in Electrical and Electronic Engineering, 8(5), 114–117.
Zarkowski, M. (2013). Identification-driven emotion recognition system for a social robot. In Proceedings of the 18th international conference on methods and models in automation and robotics (MMAR’13), August 2013 (pp. 138–143).
Zhang, L., & Han, J. (2019). Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network. arXiv:1902.10063.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jermsittiparsert, K., Abdurrahman, A., Siriattakul, P. et al. Pattern recognition and features selection for speech emotion recognition model using deep learning. Int J Speech Technol 23, 799–806 (2020). https://doi.org/10.1007/s10772-020-09690-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09690-2