Speech Recognition Combining MFCCs and Image Features

  • Stamatis Karlos
  • Nikos Fazakis
  • Katerina Karanikola
  • Sotiris Kotsiantis
  • Kyriakos Sgarbas
Conference paper

DOI: 10.1007/978-3-319-43958-7_79

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9811)
Cite this paper as:
Karlos S., Fazakis N., Karanikola K., Kotsiantis S., Sgarbas K. (2016) Speech Recognition Combining MFCCs and Image Features. In: Ronzhin A., Potapova R., Németh G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science, vol 9811. Springer, Cham

Abstract

Automatic speech recognition (ASR) task constitutes a well-known issue among fields like Natural Language Processing (NLP), Digital Signal Processing (DSP) and Machine Learning (ML). In this work, a robust supervised classification model is presented (MFCCs + autocor + SVM) for feature extraction of solo speech signals. Mel Frequency Cepstral Coefficients (MFCCs) are exploited combined with Content Based Image Retrieval (CBIR) features extracted from spectrogram produced by each frame of the speech signal. Improvement of classification accuracy using such extended feature vectors is examined against using only MFCCs with several classifiers for three scenarios of different number of speakers.

Keywords

ASR MFCCs Supervised model Feature extraction CBIR features 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Stamatis Karlos
    • 1
  • Nikos Fazakis
    • 1
  • Katerina Karanikola
    • 1
  • Sotiris Kotsiantis
    • 1
  • Kyriakos Sgarbas
    • 1
  1. 1.University of PatrasPatrasGreece

Personalised recommendations