Speech Recognition Using Novel Diatonic Frequency Cepstral Coefficients and Hybrid Neuro Fuzzy Classifier

Kondhalkar, Himgauri; Mukherji, Prachi

doi:10.1007/978-3-030-00665-5_76

Himgauri Kondhalkar²⁴ &
Prachi Mukherji²⁵

Part of the book series: Lecture Notes in Computational Vision and Biomechanics ((LNCVB,volume 30))

Included in the following conference series:

International Conference on ISMAC in Computational Vision and Bio-Engineering

1707 Accesses
1 Citations

Abstract

Speech recognition is the ability of the machine to identify spoken words and classify them into appropriate category. First stage in the process of speech recognition is the extraction of appropriate features from the recorded words. We propose a novel algorithm for feature extraction using diatonic frequency cepstral coefficients. Diatonic frequencies are derived from a musical scale called as diatonic scale. The scale is based on harmonics of sound and models nonlinear behavior of human auditory filter. After feature extraction, the next classification stage uses a hybrid classifier using artificial neural network and fuzzy logic. If the difference between prediction values available at the output of the neural network is less, the classifier matches wrong patterns. Proposed algorithm overcomes this drawback using fuzzy logic. Proposed hybrid classifier improves the recognition rate significantly over existing classifiers. Test bed used in the experimentation focuses on Marathi language. It is the native language spoken in the state of Maharashtra.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Hardcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gupta D, Bansal P, Choudhary K (2018) The state of the art of feature extraction techniques in speech recognition. In: Agrawal S, Devi A, Wason R, Bansal P (eds) Speech and language processing for human-machine communications, vol 664. Advances in intelligent systems and computing. Springer, Singapore, pp 197–207
Chapter Google Scholar
Lin Y, Abdulla WH (2015) Principles of psychoacoustics. Audio watermark. Springer, Cham, pp 15–49
Chapter Google Scholar
Shanon BJ, Paliwal KK (2003) A comparative study of filter bank spacing for speech recognition. In: Microelectronic engineering research conference, Brisbane, pp 1–3
Google Scholar
Hsieh SH, Lu CS, Pei SC (2013) Sparse fast fourier transform by downsampling. In: IEEE International conference on acoustics, Vancouver, pp 5637–5641
Google Scholar
Bhavsar H, Trivedi J (2018) Image based sign language recognition using neuro fuzzy approach. Int J Sci Res Comput Sci, Eng Inform Technol, IJSRCSEIT 3:487–491
Google Scholar
Gaikwad S, Gawali B, Mehrotra S (2013) Creation of Marathi speech corpus for automatic speech recognition. In: Conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE), Gurgaon, pp 1–5
Google Scholar
Gedam YK, Magare SS, Dabhade AC, Deshmukh RR (2014) Development of automatic speech recognition of Marathi numerals. Int J Eng Innovative Technol (IJEIT) 3:198–203
Google Scholar
Qasim M, Nawaz S, Hussain S, Habib T (2016) Urdu speech recognition system for district names of Pakistan. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 28–32
Google Scholar
Wang D, Tang Z, Tang D, Chen Q (2016) A Chinese-English Mixlingual database and a speech recognition baseline. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 84–88
Google Scholar
Li W, Hu X, Gravina R, Fortino G (2017) A neuro-fuzzy fatigue tracking and classification system for wheelchair users. IEEE Access 5:19420–19431
Article Google Scholar
Diago L, Kitaoka T, Hagiwara I, Kambayashi T (2011) Neuro-fuzzy quantification of personal perceptions of facial images based on a limited dataset. IEEE Trans Neural Networks 22:2422–2432
Article Google Scholar
Tailor JH, Shah DB (2018) HMM based light weight speech recognition system for gujarati language. In: Mishra D, Nayak M, Joshi A (eds) Information and communication technology for sustainable development. Lecture notes in networks and systems, vol 10. Springer, Singapore
Google Scholar
Samudravijaya K, Ahuja R, Bondale N, Jose T, Krishnan S, Poddar P, Raveendran R (1998) A feature based hierarchical speech recognition system for Hindi. Sadhana. 23:313–340
Article Google Scholar
Sneha V, Hardhika G, JeevaPriya K, Gupta D (2018) Isolated Kannada speech recognition using HTK-A detailed approach. In: Saeed K, Chaki N, Pati B, Bakshi S, Mohapatra D (eds) Process in advanced computing and intelligent engineering. Advances in intelligent systems and computing, vol 564. Springer, Singapore
Google Scholar
Dalmiya CP, Dharun VS, Rajesh KP, (2013) An efficient method for tamil speech recognition using MFCC and DTW mobile applications. In: IEEE conference on information and communication technologies, Jeju Island, pp 1263–1268
Google Scholar
Gaikwad S, Gawali B, Yannawar P (2010) A review on speech recognition technique. Int J Comput App 3:16–24
Google Scholar
Ganoun A, Almerhag I (2012) Performance analysis of spoken arabic digits recognition techniques. J Electron Sci Technol 10:153–157
Google Scholar
Jalil M, Butt FA, Malik A (2013) Short time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals. In: The international conference on technological advances in electrical, electronics and computer engineering (TAEECE), Konya, pp 208–212
Google Scholar
Kondhalkar H, Mukherji P (2017) A database of Marathi numerals for speech data mining. Int J Adv Res Sci Eng 6:395–399
Google Scholar
Bai Y, Wang D (2006) Fundamentals of fuzzy logic control-fuzzy sets, fuzzy rules and defuzzifications. In: Bai Y, Zhuang H, Wang D (eds) Advanced fuzzy logic technologies in industrial applications, advances in industrial control. Springer, London, pp 17–36
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Sinhgad College of Engineering, Pune, India
Himgauri Kondhalkar
Cummins College of Engineering, Pune, India
Prachi Mukherji

Authors

Himgauri Kondhalkar
View author publications
You can also search for this author in PubMed Google Scholar
Prachi Mukherji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Himgauri Kondhalkar .

Editor information

Editors and Affiliations

SCAD Institute of Technology, Palladam, India
Durai Pandian
Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada
Xavier Fernando
School of Computer and Security Science, Edith Cowan University, Joondalup, WA, Australia
Zubair Baig
Wenzhou Medical University, Wenzhou, China
Fuqian Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kondhalkar, H., Mukherji, P. (2019). Speech Recognition Using Novel Diatonic Frequency Cepstral Coefficients and Hybrid Neuro Fuzzy Classifier. In: Pandian, D., Fernando, X., Baig, Z., Shi, F. (eds) Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB). ISMAC 2018. Lecture Notes in Computational Vision and Biomechanics, vol 30. Springer, Cham. https://doi.org/10.1007/978-3-030-00665-5_76

Download citation

DOI: https://doi.org/10.1007/978-3-030-00665-5_76
Published: 02 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00664-8
Online ISBN: 978-3-030-00665-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics