Evaluation of Two Simultaneous Continuous Speech Recognition with ICA BSS and MFT-Based ASR

Takeda, Ryu; Yamamoto, Shun’ichi; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

doi:10.1007/978-3-540-73325-6_38

Ryu Takeda¹,
Shun’ichi Yamamoto¹,
Kazunori Komatani¹,
Tetsuya Ogata¹ &
…
Hiroshi G. Okuno¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4570))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1320 Accesses

Abstract

An adaptation of independent component analysis (ICA) and missing feature theory (MFT)-based ASR for two simultaneous continuous speech recognition is described. We have reported on the utility of a system with isolated word recognition, but the performance of the MFT-based ASR is affected by the configuration, such as an acoustic model. The system needs to be evaluated under a more general condition. It first separates the sound sources using ICA. Then, spectral distortion in the separated sounds is estimated to generate missing feature masks (MFMs). Finally, the separated sounds are recognized by MFT-based ASR. We estimate spectral distortion in the temporal-frequency domain in terms of feature vectors, and we generate MFMs. We tested an isolated word and the continuous speech recognition with a cepstral and spectral feature. The resulting system outperformed the baseline robot audition system by 13 and 6 points respectively on the spectral features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Takeda, R., Yamamoto, S., Komatani, K., Ogata, T., Okuno, H.G.: Improving speech recognition of two simultaneous speech signals by integrating ica bss and automatic missing feature mask generation. In: Proceedings of International Conference on Spoken Language Processing (2006)
Google Scholar
Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 82–85. IEEE Computer Society Press, Los Alamitos (2005)
Chapter Google Scholar
Yamamoto, S., Valin, J.-M., Nakadai, K., Rouat, J., Michaud, F., Ogata, T., Okuno, H.G.: Enhanced robot speech recognition based on microphone array source separation and missing feature theory. In: Proceedings of IEEE International Conference on Intelligent Robots and Systems, IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Palomaki, K.J., Brown, G.J., Barker, J.P.: Recognition of reverberant speech using full cepstral features and spectral missing data. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 289–292 (2006)
Google Scholar
Choi, S., Amari, S., Cichocki, A., Liu, R.: Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels. In: Proceeding of International Workshop on ICA and BBS, pp. 371–376 (1999)
Google Scholar
Sawada, H., Mukai, R., Araki, S., Makino, S.: Polar coordinate based nonlinear function for frequency-domain blind source separation. IEICE Trans. Fundamentals E86-A (3), 505–510 (2003)
Google Scholar
Murata, N., Ikeda, S., Ziehe, A.: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 1–24 (2001)
Google Scholar
Nishimura, Y., Shinozaki, T., Iwano, K., Furui, S.: Noise-robust speech recognition using multi-band spectral features. In: Proc. of 148th Acoustical Society of America Meetings (2004)
Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34(3)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Informatics, Kyoto University, Japan
Ryu Takeda, Shun’ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata & Hiroshi G. Okuno

Authors

Ryu Takeda
View author publications
You can also search for this author in PubMed Google Scholar
Shun’ichi Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Komatani
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi G. Okuno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hiroshi G. Okuno Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takeda, R., Yamamoto, S., Komatani, K., Ogata, T., Okuno, H.G. (2007). Evaluation of Two Simultaneous Continuous Speech Recognition with ICA BSS and MFT-Based ASR. In: Okuno, H.G., Ali, M. (eds) New Trends in Applied Artificial Intelligence. IEA/AIE 2007. Lecture Notes in Computer Science(), vol 4570. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73325-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-73325-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73322-5
Online ISBN: 978-3-540-73325-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics