Skip to main content
Log in

MFCC-GMM based accent recognition system for Telugu speech signals

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Speech processing is very important research area where speaker recognition, speech synthesis, speech codec, speech noise reduction are some of the research areas. Many of the languages have different speaking styles called accents or dialects. Identification of the accent before the speech recognition can improve performance of the speech recognition systems. If the number of accents is more in a language, the accent recognition becomes crucial. Telugu is an Indian language which is widely spoken in Southern part of India. Telugu language has different accents. The main accents are coastal Andhra, Telangana, and Rayalaseema. In this present work the samples of speeches are collected from the native speakers of different accents of Telugu language for both training and testing. In this work, Mel frequency cepstral coefficients (MFCC) features are extracted for each speech of both training and test samples. In the next step Gaussian mixture model (GMM) is used for classification of the speech based on accent. The overall efficiency of the proposed system to recognize the speaker, about the region he belongs, based on accent is 91 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aggarwal, R. K., & Dave, M. (2011). Using Gaussian mixtures for Hindi speech recognition system. International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 157–170.

    Google Scholar 

  • Beek, B., Neuberg, E., Hodge, D. (1977) An assessment of the technology of automatic speech recognition for military applications. IEEE transactions on acoustics speech and signal processing, ASSP-25 (pp 310–322).

  • Biadsy, F. (2011), Automatic dialect and accent recognition and its application to speech recognition, A Ph.D. Thesis, Columbia University. http://www.cs.columbia.edu/speech/ThesisFiles/fadi_biadsy.pdf.

  • Bricker, P. D., et al. (1971). Statistical techniques for talker identification. Bell System Technical Journal, 50, 1427–1454.

    Article  Google Scholar 

  • Eriksson, T., Kim, S., Kang, H.-G., & Lee, C. (2005). An information-theoretic perspective on feature selection in speaker recognition. IEEE Signal Processing Letters, 12(7), 500–503.

    Article  Google Scholar 

  • Ferrer, L., Bratt, H., Richey, C., Franco, H., Abrash, V., Precoda, K. (2014) Lexical stress classification for language learning using spectral and segmental features. ICASSP-14 (pp. 7754–7758).

  • Kumpf, K., & King, R. W. (1996) Automatic accent classification of foreign accented Australian english speech, ICSLP-96 (Vol. 3, pp. 1740–1743). doi: 10.1109/ICSLP.1996.607964.

  • Kun, L. I., & Jia, L. I. U. (2010). English sentence accent detection based on auditory features. Beijing: Tsinghua Tongfang Knowledge Network Technology Co., Ltd.

    Google Scholar 

  • Kumar, G. S., Prasad Raju, K. A., Satheesh, P., & Mohan Rao, (2010). Speaker recognition using GMM. International Journal of Engineering Science and Technology, 2(6), 2428–2436.

    Google Scholar 

  • Li, K. P., & Hughes, G. W. (1974). Talker differences as they appear in correlation matrices of continuous speech spectra. The Journal of the Acoustical Society of America., 55, 833–837.

    Article  Google Scholar 

  • Li, Q., & Huang, Y. (2011). An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Transactions on Audio, Speech and Language Processing, 19(6), 1791–1801.

    Article  Google Scholar 

  • Liu, M., Xu, B., Hunng, T., Deng, Y., & Li, C. (2000) Mandarin accent adaptation based on context independent/Context-dependent pronunciation modeling. In Proceedings of the acoustics, speech, and signal processing, ICASSP ‘00 (pp: II1025–II1028). Washington, DC: IEEE Computer Society.

  • Luoh, L., Su, Y.-Z., & Hsu, C.-F. (2010) Speech signal processing based emotion recognition. International Conference on System Science and Engineering, IEEE Conference (pp. 487–490).

  • Mandal, S. K. D., Gupta, B., & Datta, A. K. (2007). Word boundary detection based on supra segmental features: A case study on Bangla speech. International Journal of Speech Technology, 9(1–2), 17–28.

    Article  Google Scholar 

  • Ma, Zichen, & Fokoué, Ernest. (2014). A comparison of classifiers in performing speaker accent recognition using MFCCs. Open Journal of Statistics, 4, 258–266.

    Article  Google Scholar 

  • Malhotra, Kamini, & Khosla, Anu. (2013). Impact of regional Indian accents on spoken Hindi, Asian spoken language research and evaluation (O-COCOSDA/CASLRE). International Conference, 01(2013), 1–4. doi:10.1109/ICSDA.2013.6709876.

    Google Scholar 

  • Mannepalli, K., Sastry, P. N., Rajesh, V. (2014) Modellling and analysis of accent based recognition and speaker identification system, ARPN Journal of Engineering and Applied Sciences, 9(12), ISSN: 1819-6608.

  • Meena, K., Subramanian, U., & Muthusamy, G. (2013). Gender classification in speech recognition using fuzzy logic and neural network. The International Arab Journal of Information Technology, 10(5), 477–485.

    Google Scholar 

  • Mermelstein, P., & Davis, S. (1980). Comparison of parametric representation for mono syllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic Speech and Signal Processing, 28(4), 357–366.

    Article  Google Scholar 

  • Nidhyananthan, S. S., & Kumari, R. S. S. (2013). Language and text-independent speaker identification system using GMM. WSEAS Transactions on Signal Processing, 9(4), 185–194.

    Google Scholar 

  • Nelwamondo, F. V., & Marwala, T. (2006), Faults detection using gaussian mixture models, mel-frequency cepstral coefficients and kurtosis. IEEE International Conference on Systems, Man, and Cybernetics October 8–11, Taipei. 1-4244-0100-3/06: pp. 290–295 (Print).

  • Rao, K. S., & Koolagudi, S. G. (2011) Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systems, Cybernetics and Informatics, 9(4). ISSN: 1690-4524.

  • Singh, N., Khan, R. A., & Shree, R. (2012). MFCC and prosodic feature extraction techniques: A comparative study (0975– 8887). International Journal of Computer Applications, 54(1), 9–13.

    Article  Google Scholar 

  • Yan, Q., & Vaseghi, S. (2002) A comparative study of UK and US english accents in recognition and synthesis. IEEE international conference on acoustics, speech, and signal processing (ICASSP, 2002) (pp. 413–416). doi: 10.1109/ICASSP.2002.5745496.

  • YunXue, Z., Long, Z., ShiJie, Z., Wei, Z. (2015) Chinese accent detection research based on features structured. International Journal of Hybrid Information Technology, 8(5), 303–316. http://dx.doi.org/10.14257/ijhit.2015.8.5.33.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kasiprasad Mannepalli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mannepalli, K., Sastry, P.N. & Suman, M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol 19, 87–93 (2016). https://doi.org/10.1007/s10772-015-9328-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9328-y

Keywords

Navigation