Skip to main content
Log in

Dialect Identification Using Spectral and Prosodic Features on Single and Ensemble Classifiers

  • Research Article - Special Issue - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

In this paper, investigation of the significance of spectral and prosodic behaviors of speech signal has been carried out for dialect identification. Spectral features such as cepstral coefficients, spectral flux, and entropy are extracted from shorter frames. Prosodic attributes such as pitch, energy, and duration are derived from longer frames. IViE (Intonational Variations in English) speech corpus covering nine dialectal regions of British Isles has been considered, to evaluate the proposed approach. Since corpus is available in both read and semi-spontaneous modes, the influence of spectral and prosodic behavior over these datasets is distinguishably articulated. Further, two distinct classification algorithms, namely support vector machine (SVM) and an ensemble of decision trees along with the SVM are used for identification of nine dialects. Dialect discriminating information captured from both features are used for constructing feature vectors. Experiments have been conducted on individual and combinations of features. A better dialect recognition performance is observed with ensemble methods over a single independent SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chambers, J.K.; Trudgill, P.: Dialectology, 2nd edn. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  2. Ferragne, E.; Pellegrino, F.: Automatic dialect identification: a study of British English. Speak. Classif. II, 243–257 (2007)

    Article  Google Scholar 

  3. Chen, N.F; Shen, W.; Campbell, J.P: A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5014–5017 (2010)

  4. Harris, M.J.; Gries, S.T.; Miglio, V.G.: Prosody and its application to forensic linguistics. Ling. Evid. Sec. Law Intell. 2(2), 11–29 (2014)

    Google Scholar 

  5. Gray, S.; Hansen, J.H.L.: An integrated approach to the detection and classification of accents/dialects for a spoken document retrieval system. In: Automatic Speech Recognition and Understanding, pp. 35–40 (2005)

  6. Zissman, M.A.: Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Audio Process. 4(1), 31–44 (1996)

    Article  Google Scholar 

  7. Mehrabani, M.; Hansen, J.H.L.: Automatic analysis of dialect/language sets. Int. J. Speech Technol. 18(3), 277–286 (2015)

    Article  Google Scholar 

  8. Biadsy, F.: Automatic Dialect and Accent Recognition and its Application to Speech Recognition. PhD Thesis, Columbia University (2011)

  9. Liu, G.A.; Hansen, J.H.L.: A systematic strategy for robust automatic dialect identification. In: 19th European Signal Processing Conference, pp. 2138–2141 (2011)

  10. Sreenivasa Rao, K.; Yegnanarayana, B.: Modeling durations of syllables using neural networks. Comput. Speech Lang. 21(2), 282–295 (2007)

    Article  Google Scholar 

  11. Torres-carrasquillo, P.A.; Gleason, T.P.; Reynolds, D.A.: Dialect identification using Gaussian Mixture Models. ODYSSEY - The Speaker and Language Recognition Workshop, pp. 2–5 (2004)

  12. Huang, R.; Hansen, J.H.L.; Angkititrakul, P.: Dialect/accent classification using unrestricted audio. IEEE Trans. Audio Speech Lang. Process. 15(2), 453–464 (2007)

    Article  Google Scholar 

  13. Zissman, M.A.; Gleason, T.P.; Rekart, D.M.; Losiewicz, B.L.: Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In: IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 777–780 (1996)

  14. Lei, Y.; Hansen, J.H.L.: Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. IEEE Trans. Audio Speech Lang. Process. 19(1), 85–96 (2011)

    Article  Google Scholar 

  15. Rouas, J.L.: Automatic prosodic variations modeling for language and dialect discrimination. IEEE Trans. Audio Speech Lang. Process. 15(6), 1904–1911 (2007)

    Article  Google Scholar 

  16. Chen, N.F.; Tam, S.W.; Shen, W.; Campbell, J.P.: Characterizing phonetic transformations and acoustic differences across english dialects. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 110–124 (2014)

    Article  Google Scholar 

  17. Sarma, M.; Sarma, K.K.: Dialect Identification from Assamese speech using prosodic features and a neuro fuzzy classifier. In: 3rd International Conference on Signal Processing and Integrated Networks (SPIN), pp. 127–132 (2016)

  18. Shen, W.; Chen, N.; Reynolds, D.: Dialect recognition using adapted phonetic models. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 763–766 (2008)

  19. Purnell, T.; Idsardi, W.; Baugh, J.: Perceptual and phonetic experiments on American English dialect identification. J. Lang. Soc. Psychol. 18(1), 10–30 (1999)

    Article  Google Scholar 

  20. Chen, T.; Huang, C.; Chang, E.; Wang, J.: Automatic accent identification using Gaussian Mixture Models. In: Automatic Speech Recognition and Understanding, IEEE Workshop, pp. 343–346 (2001)

  21. Dehak, N.; Torres-Carrasquillo, P.A.; Reynolds, D.A.; Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Interspeech, pp. 857–860 (2011)

  22. Hansen, J.H.L.; Liu, G.: Unsupervised accent classification for deep data fusion of accent and language information. Speech Commun. 78, 19–33 (2016)

    Article  Google Scholar 

  23. Sreenivasa Rao, K.; Koolagudi, S.G.: Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Int. J. Syst. Cybern. Inform. 9(4), 24–33 (2011)

    Google Scholar 

  24. Etman, A.; Louis, A.A.: American dialect identification using phonotactic and prosodic features. In: SAI Intelligent Systems Conference (IntelliSys), pp. 963–970 (2015)

  25. Biadsy, F.; Hirschberg, J.; Habash, N.: Spoken Arabic dialect identification using phonotactic modeling. In: Proceedings of the Workshop on Computational Approaches to Semitic Languages Conducted by Association for Computational Linguistics, pp. 53–61 (2009)

  26. Utami, I.T.; Sartono, B.; Sadik, K.: Comparison of single and ensemble classifiers of support vector machine and classification tree. J. Math. Sci. Appl. 2(2), 17–20 (2014)

    Google Scholar 

  27. Pedersen, C.; Diederich, J.: Accent classification using support vector machines. In: Computer and Information Science, 6th IEEE/ACIS, pp. 444–449 (2007)

  28. Chitturi, R.; Hansen, J.H.L.: Multi-stream dialect classification using SVM-GMM hybrid classifiers. In: IEEE Workshop on Automatic Speech Recognition Understanding (ASRU), pp. 431–436 (2007)

  29. Lachachi, N.E.; Adla, A.: Two approaches-based L2-SVMs reduced to MEB problems for dialect identification. Int. J. Comput. Vis. Robot. 6(1–2), 1–18 (2016)

    Google Scholar 

  30. Darwish, K.; Sajjad, H.; Mubarak, H.: Verifiably Effective Arabic dialect identification. In: Empirical Methods in Natural Language Processing, pp. 1465–1468 (2014)

  31. Malmasi, S.; Dras, M.: Language identification using classifier ensembles. In: Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, pp. 35–43 (2015)

  32. Grabe, E.; Post, B.: Intonational variation in the british isles. In: Speech Prosody, International Conference (2002)

  33. Giannakopoulos, T.; Pikrakis, A.: Introduction to Audio Analysis: A MATLAB Approach. Academic Press, London (2014)

    Google Scholar 

  34. Reetz, H.; Jongman, A.: Phonetics Transcription, Production, Aoustics and Perception. Wiley Blackwell, New York (2009)

    Google Scholar 

  35. Tsai, W.H.; Chang, W.W.: Discriminative training of gaussian mixture bigram models with application to chinese dialect identification. Speech Commun. 36(3), 317–326 (2002)

    Article  MATH  Google Scholar 

  36. Hermansky, H.; Morgan, N.: Rasta processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)

    Article  Google Scholar 

  37. Kotnik, B.; Vlaj, D.; Kacic, Z; Horvat, B.: Robust MFCC feature extraction algorithm using efficient additive and convolutional noise reduction procedures. In: ICSLP, 2, pp. 445–448 (2002)

  38. Ramus, F.; Mehler, J.: Language identification with suprasegmental cues: a study based on speech resynthesis. J. Acoust. Soc. Am. 105(1), 512–521 (1999)

    Article  Google Scholar 

  39. Liu, G.; Lei, Y.; Hansen, J.H.L.: Dialect identification: impact of differences between read versus spontaneous speech. In: 18th European Signal Processing Conference, pp. 2003–2006. IEEE (2010)

  40. Nakamura, M.; Iwano, K.; Furui, S.: Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance. Comput. Speech Lang. 22(2), 171–184 (2008)

    Article  Google Scholar 

  41. Wightman, C.W.: Automatic detection of prosodic constituents for parsing. Doctoral dissertation (1992)

  42. Sun, X.: A pitch determination algorithm based on subharmonic-to-harmonic ratio. In: The 6th International Conference of Spoken Language Processing, pp. 676–679 (2000)

  43. Campbell, W.M.; Campbell, J.P.; Reynolds, D.A.; Singer, E.; Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Comput. Speech Lang. 20(2), 210–229 (2006)

    Article  Google Scholar 

  44. Paleologo, G.; Elisseeff, A.; Antonini, G.: Subagging for credit scoring models. Eur. J. Oper. Res. 201(2), 490–499 (2010)

    Article  Google Scholar 

  45. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  46. Freund, Y.; Schapire, R.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14, 771–780 (1999)

    Google Scholar 

  47. Chang, C.-C.; Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)

    Article  Google Scholar 

  48. Dietterich, T.G.: Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp. 1–15 (2000)

  49. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  50. Friedman, J.; Hastie, T.; Tibshirani, R.: The Elements of Statistical Learning, Volume 1. Springer Series in Statistics. Springer, New York (2001)

    MATH  Google Scholar 

  51. Geurts, P.; Ernst, D.; Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)

    Article  MATH  Google Scholar 

  52. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  53. Chen, T.; Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

  54. Kim, H.C.; Pang, S.; Je, H.M.; Kim, D.; Bang, S.Y.: Support vector machine ensemble with bagging. In: Pattern Recognition with Support Vector Machines: First International Workshop, pp. 397–408 (2002)

  55. Grabe, E.; Post, B.; Nolan, F.: The IViE Corpus. Department of Linguistics. University of Cambridge, Cambridge (2001)

    Google Scholar 

  56. Marc, C.; De Frank, S.; Johan, S.; De Bart, M.: EnsembleSVM: a library for ensemble learning using support vector machines. J. Mach. Learn. Res. 15, 141–145 (2014)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nagaratna B. Chittaragi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chittaragi, N.B., Prakash, A. & Koolagudi, S.G. Dialect Identification Using Spectral and Prosodic Features on Single and Ensemble Classifiers. Arab J Sci Eng 43, 4289–4302 (2018). https://doi.org/10.1007/s13369-017-2941-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-017-2941-0

Keywords

Navigation