Skip to main content

Speech Recognition Using Novel Diatonic Frequency Cepstral Coefficients and Hybrid Neuro Fuzzy Classifier

  • Conference paper
  • First Online:
Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB) (ISMAC 2018)

Abstract

Speech recognition is the ability of the machine to identify spoken words and classify them into appropriate category. First stage in the process of speech recognition is the extraction of appropriate features from the recorded words. We propose a novel algorithm for feature extraction using diatonic frequency cepstral coefficients. Diatonic frequencies are derived from a musical scale called as diatonic scale. The scale is based on harmonics of sound and models nonlinear behavior of human auditory filter. After feature extraction, the next classification stage uses a hybrid classifier using artificial neural network and fuzzy logic. If the difference between prediction values available at the output of the neural network is less, the classifier matches wrong patterns. Proposed algorithm overcomes this drawback using fuzzy logic. Proposed hybrid classifier improves the recognition rate significantly over existing classifiers. Test bed used in the experimentation focuses on Marathi language. It is the native language spoken in the state of Maharashtra.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 59.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gupta D, Bansal P, Choudhary K (2018) The state of the art of feature extraction techniques in speech recognition. In: Agrawal S, Devi A, Wason R, Bansal P (eds) Speech and language processing for human-machine communications, vol 664. Advances in intelligent systems and computing. Springer, Singapore, pp 197–207

    Chapter  Google Scholar 

  2. Lin Y, Abdulla WH (2015) Principles of psychoacoustics. Audio watermark. Springer, Cham, pp 15–49

    Chapter  Google Scholar 

  3. Shanon BJ, Paliwal KK (2003) A comparative study of filter bank spacing for speech recognition. In: Microelectronic engineering research conference, Brisbane, pp 1–3

    Google Scholar 

  4. Hsieh SH, Lu CS, Pei SC (2013) Sparse fast fourier transform by downsampling. In: IEEE International conference on acoustics, Vancouver, pp 5637–5641

    Google Scholar 

  5. Bhavsar H, Trivedi J (2018) Image based sign language recognition using neuro fuzzy approach. Int J Sci Res Comput Sci, Eng Inform Technol, IJSRCSEIT 3:487–491

    Google Scholar 

  6. Gaikwad S, Gawali B, Mehrotra S (2013) Creation of Marathi speech corpus for automatic speech recognition. In: Conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE), Gurgaon, pp 1–5

    Google Scholar 

  7. Gedam YK, Magare SS, Dabhade AC, Deshmukh RR (2014) Development of automatic speech recognition of Marathi numerals. Int J Eng Innovative Technol (IJEIT) 3:198–203

    Google Scholar 

  8. Qasim M, Nawaz S, Hussain S, Habib T (2016) Urdu speech recognition system for district names of Pakistan. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 28–32

    Google Scholar 

  9. Wang D, Tang Z, Tang D, Chen Q (2016) A Chinese-English Mixlingual database and a speech recognition baseline. In: Conference of the oriental chapter of international committee for coordination and standardization of speech databases and assessment technique, Bali, pp 84–88

    Google Scholar 

  10. Li W, Hu X, Gravina R, Fortino G (2017) A neuro-fuzzy fatigue tracking and classification system for wheelchair users. IEEE Access 5:19420–19431

    Article  Google Scholar 

  11. Diago L, Kitaoka T, Hagiwara I, Kambayashi T (2011) Neuro-fuzzy quantification of personal perceptions of facial images based on a limited dataset. IEEE Trans Neural Networks 22:2422–2432

    Article  Google Scholar 

  12. Tailor JH, Shah DB (2018) HMM based light weight speech recognition system for gujarati language. In: Mishra D, Nayak M, Joshi A (eds) Information and communication technology for sustainable development. Lecture notes in networks and systems, vol 10. Springer, Singapore

    Google Scholar 

  13. Samudravijaya K, Ahuja R, Bondale N, Jose T, Krishnan S, Poddar P, Raveendran R (1998) A feature based hierarchical speech recognition system for Hindi. Sadhana. 23:313–340

    Article  Google Scholar 

  14. Sneha V, Hardhika G, JeevaPriya K, Gupta D (2018) Isolated Kannada speech recognition using HTK-A detailed approach. In: Saeed K, Chaki N, Pati B, Bakshi S, Mohapatra D (eds) Process in advanced computing and intelligent engineering. Advances in intelligent systems and computing, vol 564. Springer, Singapore

    Google Scholar 

  15. Dalmiya CP, Dharun VS, Rajesh KP, (2013) An efficient method for tamil speech recognition using MFCC and DTW mobile applications. In: IEEE conference on information and communication technologies, Jeju Island, pp 1263–1268

    Google Scholar 

  16. Gaikwad S, Gawali B, Yannawar P (2010) A review on speech recognition technique. Int J Comput App 3:16–24

    Google Scholar 

  17. Ganoun A, Almerhag I (2012) Performance analysis of spoken arabic digits recognition techniques. J Electron Sci Technol 10:153–157

    Google Scholar 

  18. Jalil M, Butt FA, Malik A (2013) Short time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals. In: The international conference on technological advances in electrical, electronics and computer engineering (TAEECE), Konya, pp 208–212

    Google Scholar 

  19. Kondhalkar H, Mukherji P (2017) A database of Marathi numerals for speech data mining. Int J Adv Res Sci Eng 6:395–399

    Google Scholar 

  20. Bai Y, Wang D (2006) Fundamentals of fuzzy logic control-fuzzy sets, fuzzy rules and defuzzifications. In: Bai Y, Zhuang H, Wang D (eds) Advanced fuzzy logic technologies in industrial applications, advances in industrial control. Springer, London, pp 17–36

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Himgauri Kondhalkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kondhalkar, H., Mukherji, P. (2019). Speech Recognition Using Novel Diatonic Frequency Cepstral Coefficients and Hybrid Neuro Fuzzy Classifier. In: Pandian, D., Fernando, X., Baig, Z., Shi, F. (eds) Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB). ISMAC 2018. Lecture Notes in Computational Vision and Biomechanics, vol 30. Springer, Cham. https://doi.org/10.1007/978-3-030-00665-5_76

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00665-5_76

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00664-8

  • Online ISBN: 978-3-030-00665-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics