Filled Pause Classification Using Energy-Boosted Mel-Frequency Cepstrum Coefficients

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 291)


Filled pause is one type of disfluency, identified as the often occurred disfluency in spontaneous speech and known to affect Automatic Speech Recognition accuracy. The purpose of this study is to analyze the impact of boosting Mel-Frequency Cepstral Coefficients with energy feature in classifying filled pause. A total of 828 filled pauses comprising a mixture of 62 male and female speakers are classified into /mhm/, /aaa/ and /eer/. A back-propagation neural network using fusion of gradient descent with momentum and adaptive learning rate is used as the classifier. The results revealed that energy-boosted Mel-Frequency Cepstral Coefficients produced a higher accuracy rate of 77 % in classifying filled pauses.


Malay filled pause Energy Mel-frequency cepstral coefficients Energy-boosted MFCC Artificial neural network Gradient descent momentum 



The authors thankfully acknowledge Ministry of Higher Education Malaysia for Fundamental Research Grant Scheme (FRGS, Grant No: 600-RMI/FRGS 5/3(48/2013) and MARA University of Technology for providing research facilities throughout this research.


  1. 1.
    Mahesha P, Vinod DS (2012) Feature based classification of dysfluent and normal speech. In: Proceedings of the second international conference on computational science, engineering and information technology (CCSEIT’ 12). ACM, New York, pp 594–597Google Scholar
  2. 2.
    Rosdi F, Ainon RN (2008) Isolated Malay speech recognition using Hidden Markov Models. In: Proceedings of the international conference on computer and communications engineering (ICCCE08), Malaysia, pp 721–725Google Scholar
  3. 3.
    Hu Y (2009) Detecting non-speech in dysarthric speech. Master thesis. University of SheffieldGoogle Scholar
  4. 4.
    Seman N (2012) Coalition of artificial intelligent (AI) algorithms for isolated spoken Malay speech recognition. PhD thesis, UniversitiTeknologi Mara, Shah AlamGoogle Scholar
  5. 5.
    Horia IC (2011) Towards a speaker-independent, large-vocabulary continuous speech recognition system for Romanian. PhD thesis, University of Politehnica Din BucureştiGoogle Scholar
  6. 6.
    Garg G, Ward N (2006) Detecting filled pauses in tutorial dialogs. In: Report of University of Texas at El Paso, El PasoGoogle Scholar
  7. 7.
    Kaushik M (2010) Automatic detection and removal of disfluencies from spontaneous speech. In: Proceedings 13th Australasian international conference on speech science and technology Melbourne, pp 98–101Google Scholar
  8. 8.
    Veiga A, Candeias S, Lopes C, Perdigão F (2011) Characterization of hesitations using acoustic models. In: ICPhS XVII, Hong Kong, pp 17–21Google Scholar
  9. 9.
    Stouten F (2006) Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. Speech Commun 48(11):1590–1606CrossRefGoogle Scholar
  10. 10.
    Audhkhasi K (2009) Formant-based technique for automatic filled-pause detection in spontaneous spoken English. In: Acoustics, speech and signal processing, IEEE international conference ICASSP, IEEE, pp 4857–4860Google Scholar
  11. 11.
    Sakhnov KE, Verteletskaya E, Simak B (2009) Approach for energy-based voice detector with adaptive scaling factor. IAENG Int J Comput Sci 36(4):394–399, IJCS_36_4_16Google Scholar
  12. 12.
    Majeed SA, Husain HS, Samad A, Hussain A (2012) Hierarchical K-Means algorithm applied on isolated Malay digit speech recognition. In: International conference on system engineering and modeling (ICSEM 2012) IPCSIT, vol 34. IACSIT Press, SingaporeGoogle Scholar
  13. 13.
    Dede G, Sazli MH (2010) Speech recognition with artificial neural network. Digit Signal Proc 20(3):763–768CrossRefGoogle Scholar
  14. 14.
    Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 30:451–462CrossRefGoogle Scholar
  15. 15.
    Ayoubi S, Shahri AP, Karchegani PM, Sahrawat A (2011) Application of artificial neural network (ANN) to predict soil organic matter using remote sensing data in two ecosystems. In: Atazadeh I (ed) Biomass and remote sensing of biomass, ISBN: 978-953-307-490-0Google Scholar
  16. 16.
    Gong L, Liu C, Li Y, Yuan F (2012) Training feed-forward neural networks using the gradient descent method with the optimal stepsize. J Comput Inf Syst 8(4):1359–1371Google Scholar
  17. 17.
    Wang Z, Bovik AC (2009) Mean square error: love it or leave it? IEEE Sig Proc Mag 26(1):98–117CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  • Raseeda Hamzah
    • 1
  • Nursuriati Jamil
    • 1
  • Noraini Seman
    • 1
  1. 1.Faculty of Computer and Mathematical SciencesMARA University of TechnologyShah AlamMalaysia

Personalised recommendations