Multi Channel Sequence Processing

Bengio, Samy; Bourlard, Hervé

doi:10.1007/11559887_2

Samy Bengio²¹ &
Hervé Bourlard^21,22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3635))

Included in the following conference series:

International Workshop on Deterministic and Statistical Methods in Machine Learning

2271 Accesses
3 Citations

Abstract

This paper summarizes some of the current research challenges arising from multi-channel sequence processing. Indeed, multiple real life applications involve simultaneous recording and analysis of multiple information sources, which may be asynchronous, have different frame rates, exhibit different stationarity properties, and carry complementary (or correlated) information. Some of these problems can already be tackled by one of the many statistical approaches towards sequence modeling. However, several challenging research issues are still open, such as taking into account asynchrony and correlation between several feature streams, or handling the underlying growing complexity. In this framework, we discuss here two novel approaches, which recently started to be investigated with success in the context of large multimodal problems. These include the asynchronous HMM, providing a principled approach towards the processing of multiple feature streams, and the layered HMM approach, providing a good formalism for decomposing large and complex (multi-stream) problems into layered architectures. As briefly reported here, combination of these two approaches yielded successful results on several multi-channel tasks, ranging from audio-visual speech recognition to automatic meeting analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dupont, S., Luettin, J.: Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia 2, 141–151 (2000)
Article Google Scholar
Bengio, S.: Multimodal speech processing using asynchronous hidden markov models. Information Fusion 5, 81–89 (2004)
Article Google Scholar
Morris, A., Hagen, A., Glotin, H., Bourlard, H.: Multi-stream adaptive evidence combination for noise robust ASR. Speech Communication (2001)
Google Scholar
McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic analysis of multimodal group actions in meetings. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 305–317 (2005)
Article Google Scholar
Gatica-Perez, D., Lathoud, G., McCowan, I., Odobez, J.M.: A mixed-state i-particle filter for multi-camera speaker tracking. In: Proc. of WOMTEC (2003)
Google Scholar
Renals, S., Abberley, D., Kirby, D., Robinson, T.: Indexing and retrieval of broadcast news. Speech Communication 32, 5–20 (2000)
Article Google Scholar
Westerveld, T., de Vries, A.P., van Ballegooij, A., de Jong, F., Hiemstra, D.: A probabilistic multimedia retrieval model and its evaluation. EURASIP Journal on Applied Signal Processing 2 (2003)
Google Scholar
Mann, S.: Smart clothing: The wearable computer and wearcam. Personal Technologies 1(1) (1997)
Google Scholar
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Boreczky, J.S., Wilcox, L.D.: A Hidden Markov Model framework for video segmentation using audio and image features. In: Proc. of ICASSP, vol. 6 (1998)
Google Scholar
Xie, L., Chang, S.F., Divakaran, A., Sun, H.: Structure analysis of soccer video with Hidden Markov Models. In: ICASSP (2002)
Google Scholar
Eickeler, S., Müller, S.: Content-based video indexing of TV broadcast news using Hidden Markov Models. In: Proc. of ICASSP (1999)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM Journal of Research and Development 13, 675–685 (1969)
Article MATH MathSciNet Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 260–269 (1967)
Google Scholar
Oliver, N., Horvitz, E., Garg, A.: Layered representations for learning and inferring office activity from multiple sensory channels. In: Proc. of the Int. Conf. on Multimodal Interfaces (2002)
Google Scholar
Bourlard, H., Dupont, S.: Subband-based speech recognition. In: Proc. IEEE ICASSP (1997)
Google Scholar
Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-visual automatic speech recognition: An overview. In: Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) Issues in Visual and Audio-Visual Speech Processing. MIT Press, Cambridge (2004)
Google Scholar
Brand, M.: Coupled hidden markov models for modeling interacting processes. Technical Report 405, MIT Media Lab Vision and Modeling (1996)
Google Scholar
Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15 (2003)
Google Scholar
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I., Lathoud, G.: Modeling individual and group actions in meetings: a two-layer hmm framework. In: IEEE Workshop on Event Mining at CVPR 2004 (2004)
Google Scholar
Bourlard, H., Bengio, S., Doss, M.M., Zhu, Q., Mesot, B., Morgan, N.: Towards using hierarchical posteriors for flexible automatic speech recognition systems. In: Proc. of DARPA EARS Rich Transcription Workshop (2004)
Google Scholar
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, London (1995)
Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5, 157–166 (1994)
Article Google Scholar
Pigeon, S., Vandendorpe, L.: The M2VTS multimodal face database (release 1.00). In: Proc. of the Conf. on AVBPA (1997)
Google Scholar
Varga, A., Steeneken, H., Tomlinson, M., Jones, D.: The noisex-92 study on the effect of additive noise on automatic speech recognition. Technical report, DRA Speech Research Unit (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

IDIAP Research Institute, rue du Simplon 4, CP 592, 1920, Martigny, Switzerland
Samy Bengio & Hervé Bourlard
Swiss Federal Institute of Technology at Lausanne (EPFL), Switzerland
Hervé Bourlard

Authors

Samy Bengio
View author publications
You can also search for this author in PubMed Google Scholar
Hervé Bourlard
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK
Joab Winkler
Department of Computer Science, The University of Sheffield, Regent Court,211 Portobello Street, S1 4DP, Sheffield, UK
Mahesan Niranjan
University of Manchester, UK
Neil Lawrence

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bengio, S., Bourlard, H. (2005). Multi Channel Sequence Processing. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_2

Download citation

DOI: https://doi.org/10.1007/11559887_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29073-5
Online ISBN: 978-3-540-31728-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics