Performances of Speech Signal Biometric Systems Based on Signal to Noise Ratio Degradation

  • Dzati Athiar Ramli
  • Salina Abdul Samad
  • Aini Hussain
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 85)


In this study the performances of speech based biometric systems at different levels of signal to noise ratio i.e. clean, 30dB, 20dB and 10dB are experimented. This study also suggests the integration of visual information to the speech based biometric systems in order to enhance the audio only systems performances. The weighting factor for combination of audio and visual scores is optimized by performing the validation data set evaluation and the min-max normalization technique is then used for fusion scheme. Incorporating visual information to the systems increases the decision accuracy compared to the audio only system. The EER performance of the integration system in clean, 30dB, 20dB and 10dB SNRs are observed as 0.0019%, 0.0084%, 0.9356% and 5.0160%, respectively compared to the EER performances of 1.1599%, 2.5113%, 19.3423% and 39.8649% for audio only system. In this study, Support Vector Machine (SVM) classifier is used for pattern matching and Mel Frequency Cepstral Coefficient (MFCC) are extracted as audio features.


Support Vector Machine Root Mean Square Speech Signal Equal Error Rate Speaker Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Becchetti, C., Ricotti, L.R.: Speech recognition: Theory and C++ implementation. John Wiley & Son Ltd., England (1999)Google Scholar
  2. Campbell, W.M.: A SVM/HMM system for speaker recognition. IEEE ICASSP 2, 209–212 (2003)Google Scholar
  3. Campbell, J.P., Reynolds, D.A., Dunn, R.B.: Fusing high and low level features for speaker recognition. In: Proceeding of EUROSPEECH, pp. 2665–2668 (2003)Google Scholar
  4. Furui, S.: Digital speech processing, synthesis and recognition. Marcel Dekker, Inc., USA (2000)Google Scholar
  5. Gunn, S.R.: Support Vector Machine for Classification and Regression. Technical Report. Faculty of Engineering, Science and Mathematics, University of Southampton (2005)Google Scholar
  6. Kung, S.Y., Mak, M.W., Lin, S.H.: Biometric Authentication: a machine learning approach. Prentice Hall, New Jersey (2004)Google Scholar
  7. Rabiner, L.R., Schafer, R.W.: Digital Signal Processing of Speech Signal. Prentice Hall Inc., New Jersey (1978)Google Scholar
  8. Ramli, D.A., Samad, S.A., Hussain, A.: Score Information Decision Fusion using Support Vector Machine for a Correlation Filter Based Speaker Authentication System. In: Corchado, E., et al. (eds.) Proceedings of the International Workshop on Computational Intelligence in Security for Information System CISIS 2008. Advances in Soft Computing Series, vol. 53, pp. 235–242. Springer, Heidelberg (2008)Google Scholar
  9. Reynolds, D.A.: An overview of automatic speaker recognition technology. IEEE Transactions on Acoustics, Speech and Signal Processing 4, 4072–4075 (2002)Google Scholar
  10. Sanderson, C., Paliwal, K.K.: Noise compensation in a multi-modal verification system. In: Proceeding of International Conference on Acoustic, Speech and Signal Processing, pp. 157–160 (2001)Google Scholar
  11. Trias, M.: Face verification based on Support Vector Machine. Tesis M. Sc. Ecole Polytechnique Federale de Lausanne (2005)Google Scholar
  12. Vapnik, V.N.: The nature of statistical learning theory. Springer, Berlin (1995)zbMATHGoogle Scholar
  13. Wan, V., Campbell, W.M.: Support Vector Machines for speaker verification and identification. Proceeding of Neural Network for Signal Processing 2, 775–784 (2000)Google Scholar
  14. Wan, V.: Speaker verification using Support Vector Machine. Tesis Ph.D. University of Sheffield (2003)Google Scholar
  15. Wark, T., Sridharan, S.: Adaptive fusion of speech and lip information for robust speaker identification. Digital Signal Processing 11, 169–186 (2001)CrossRefGoogle Scholar
  16. Wu, Z., Cai, L., Meng, H.: Multi-level fusion of audio and visual features for speaker identification. In: Zhang, D., Jain, A.K. (eds.) Advanced in Biometrics, pp. 493–499. Springer, Berlin (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Dzati Athiar Ramli
    • 1
  • Salina Abdul Samad
    • 2
  • Aini Hussain
    • 2
  1. 1.School of Electrical & Electronic Engineering, USM Engineering CampusUniversiti Sains MalaysiaNibong Tebal, Pulau PinangMalaysia
  2. 2.Department of Electrical, Electronic and Systems Engineering, Faculty of EngineeringUniversiti Kebangsaan MalaysiaUKM BangiMalaysia

Personalised recommendations