Optical Memory and Neural Networks

, Volume 23, Issue 1, pp 34–42

About neural-network algorithms application in viseme classification problem with face video in audiovisual speech recognition systems

Article

DOI: 10.3103/S1060992X14010068

Cite this article as:
Savchenko, A.V. & Khokhlova, Y.I. Opt. Mem. Neural Networks (2014) 23: 34. doi:10.3103/S1060992X14010068

Abstract

The paper considers the phoneme recognition by facial expressions of a speaker in voice-activated control systems. We have developed a neural network recognition algorithm by using the phonetic words decoding method and the requirement for isolated syllable pronunciation of voice commands. The paper presents the experimental results of viseme (facial and lip position corresponding to a particular phoneme) classification of Russian vowels. We show the dependence of the classification accuracy on the used classifier (multilayer feed-forward network, support vector machine, k-nearest neighbor method), image features (histogram of oriented gradients, eigenvectors, SURF local descriptors) and the type of camera (built-in or Kinect one). The best accuracy of speaker-dependent recognition is shown to be 85% for a built-in camera and 96% for Kinect depth maps when the classification is performed with the histogram of oriented gradients and the support vector machine.

Keywords

neural-network-aided image recognition audiovisual speech recognition phonetic decoding method histogram of oriented gradients support vector machine Kinect 

Copyright information

© Allerton Press, Inc. 2014

Authors and Affiliations

  1. 1.Department of Business Informatics and Applied MathematicsNational Research University High School of EconomicsNizhni NovgorodRussia

Personalised recommendations