Performance Evaluation of a Speech Interface for Motorcycle Environment

  • Iosif Mporas
  • Todor Ganchev
  • Otilia Kocsis
  • Nikos Fakotakis
Part of the IFIP International Federation for Information Processing book series (IFIPAICT, volume 296)

Abstract

In the present work we investigate the performance of a number of traditional and recent speech enhancement algorithms in the adverse non-stationary conditions, which are distinctive for motorcycle on the move. The performance of these algorithms is ranked in terms of the improvement they contribute to the speech recognition rate, when compared to the baseline result, i.e. without speech enhancement. The experimentations on the MoveOn motorcycle speech and noise database suggested that there is no equivalence between the ranking of algorithms based on the human perception of speech quality and the speech recognition performance. The Multi-band spectral subtraction method was observed to lead to the highest speech recognition performance.

Keywords

Speech Recognition Speech Signal Speech Quality Speech Enhancement Spectral Subtraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Gartner, U., Konig, W., Wittig, T. (2001). Evaluation of Manual vs. Speech input when using a driver information system in real traffic. Driving Assessment 2001: 1st International Driving Symposium on Human Factors in Driver Assessment, Training and Ve-chicle Design, pp. 7–13, CO.Google Scholar
  2. 2.
    Berton, A., Buhler, D., Minker, W. (2006). SmartKom-Mobile Car: User Interaction with Mobile Services in a Car Environment. In SmartKom: Foundations of Multimodal Dialogue Systems, Wolfgang Wahlster (Ed.). pp. 523–537, Springer.Google Scholar
  3. 3.
    Bohus, D., Rudnicky, A.I. (2003). RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda. Proceedings European Conference on Speech Communication and Technology (EUROSPEECH):597–600.Google Scholar
  4. 4.
    Bohus, D., Raux, A., Harris, T.K., Eskenazi, M., Rudnicky, A.I. (2007). Olympus: an open-source framework for conversational spoken Language interface research, Bridging the Gap: Academic and Industrial Research in Dialog Technology workshop at HLT/NAACL 2007.Google Scholar
  5. 5.
    Berouti, M., Schwartz, R., Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proceedings IEEE ICASSP′79:208–211.Google Scholar
  6. 6.
    Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 9(5):504–512.CrossRefGoogle Scholar
  7. 7.
    Kamath, S., Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. Proceedings ICASSP—02.Google Scholar
  8. 8.
    Ephraim, Y., Malah, D. (1985). Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, Signal Processing 33:443–445.CrossRefGoogle Scholar
  9. 9.
    Loizou, P. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum. IEEE Transactions on Speech and Audio Processing 13(5):857–869.CrossRefGoogle Scholar
  10. 10.
    Hu,Y., Loizou, P. (2003). A generalized subspace approach for enhancing speech corrupted by coloured noise. IEEE Transactions on Speech and Audio Processing 11:334–341.CrossRefGoogle Scholar
  11. 11.
    Jabloun, F., Champagne, B. (2003). Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing 11(6):700–708.CrossRefGoogle Scholar
  12. 12.
    Hu, Y., Loizou, P. (2004). Speech enhancement based on wavelet thresholding the multi-taper spectrum. IEEE Transactions on Speech and Audio Processing 12(1):59–67.CrossRefGoogle Scholar
  13. 13.
    Winkler, T., Kostoulas, T., Adderley, R., Bonkowski, C., Ganchev, T., Kohler, J., Fako-takis N. (2008). The MoveOn Motorcycle Speech Corpus. Proceedings of LREC′2008.Google Scholar
  14. 14.
    Lee, A., Kawahara, T., Shikano, K. (2001). Julius an open source real-time large vocabulary recognition engine. Proceedings European Conference on Speech Communication and Technology (EUROSPEECH):1691–1694.Google Scholar
  15. 15.
    Hoge, H., Draxler, C., Van den Heuvel, H., Johansen, F.T., Sanders, E., Tropf, H.S. (1999). SpeechDat Multilingual Speech Databases for Teleservices: Across the Finish Line. Proceedings 6th European Conference on Speech Communication and Technology (EUROSPEECH):2699–2702.Google Scholar
  16. 16.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ol-lason, D., Povey, D., Valtchev, V., Woodland, P. (2005). The HTK Book (for HTK Version 3.3). Cambridge University.Google Scholar
  17. 17.
    Davis, S.B., Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28(4):357–366.CrossRefGoogle Scholar
  18. 18.
    Baum, L.E., Petrie, T., Soules, G., Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41(1):164–171.MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Clarkson, P.R., Rosenfeld, R. (1997). Statistical Language Modeling Using the CMU-Cambridge Toolkit. Proceedings 5th European Conference on Speech Communication and Technology (EUROSPEECH): 2707–2710.Google Scholar
  20. 20.
    Winkler, T., Ganchev, T., Kostoulas,T., Mporas, I., Lazaridis, A., Ntalampiras, S., Badii, A., Adderley, R., Bonkowski, C. (2007). MoveOn Deliverable D.5: Report on Audio databases, Noise processing environment, ASR and TTS modules.Google Scholar
  21. 21.
    Ntalampiras, S., Ganchev, T., Potamitis, I., Fakotakis, N. (2008). Objective comparison of speech enhancement algorithms under real world conditions. Proceedings PETRA 2008:34.Google Scholar
  22. 22.
    Loizou P. (2007). Speech Enhancement: Theory and Practice, CRC Press, 2007.Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2009

Authors and Affiliations

  • Iosif Mporas
    • 1
  • Todor Ganchev
    • 1
  • Otilia Kocsis
    • 1
  • Nikos Fakotakis
    • 1
  1. 1.Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer EngineeringUniversity of PatrasRionGreece

Personalised recommendations