Evaluating the Performance of a Speech Recognition Based System

  • Vinod Kumar Pandey
  • Sunil Kumar Kopparapu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 192)


Speech based solutions have taken center stage with growth in the services industry where there is a need to cater to a very large number of people from all strata of the society. While natural language speech interfaces are the talk in the research community, yet in practice, menu based speech solutions thrive. Typically in a menu based speech solution the user is required to respond by speaking from a closed set of words when prompted by the system. A sequence of human speech response to the IVR prompts results in the completion of a transaction. A transaction is deemed successful if the speech solution can correctly recognize all the spoken utterances of the user whenever prompted by the system. The usual mechanism to evaluate the performance of a speech solution is to do an extensive test of the system by putting it to actual people use and then evaluating the performance by analyzing the logs for successful transactions. This kind of evaluation could lead to dissatisfied test users especially if the performance of the system were to result in a poor transaction completion rate. To negate this the Wizard of Oz approach is adopted during evaluation of a speech system. Overall this kind of evaluations is an expensive proposition both in terms of time and cost. In this paper, we propose a method to evaluate the performance of a speech solution without actually putting it to people use. We first describe the methodology and then show experimentally that this can be used to identify the performance bottlenecks of the speech solution even before the system is actually used thus saving evaluation time and expenses.


Speech solution evaluation Speech recognition Pre-launch recognition performance measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (2001)zbMATHGoogle Scholar
  2. 2.
    Kim, C., Stern, R.: Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 4574–4577 (2010)Google Scholar
  3. 3.
    Lua, X., Matsudaa, S., Unokib, M., Nakamuraa, S.: Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition. Speech Communication 52, 1–11 (2010)CrossRefGoogle Scholar
  4. 4.
    Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33, 31–88 (2001)CrossRefGoogle Scholar
  5. 5.
    Sun, Y., Gemmeke, J., Cranen, B., Bosch, L., Boves, L.: Using a DBN to integrate sparse classification and GMM-based ASR. In: Proceedings of Interspeech 2010 (2010)Google Scholar
  6. 6.
    Zhao, Y., Juang, B.: A comparative study of noise estimation algorithms for VTS-based robust speech recognition. In: Proceedings of Interspeech 2010 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Vinod Kumar Pandey
    • 1
  • Sunil Kumar Kopparapu
    • 1
  1. 1.TCS Innovation Labs - MumbaiTata Consultancy ServicesThaneIndia

Personalised recommendations