Audio-Visual User Identification in HCI Scenarios

Kächele, Markus; Meudt, Sascha; Schwarz, Andrej; Schwenker, Friedhelm

doi:10.1007/978-3-319-14899-1_11

Markus Kächele⁷,
Sascha Meudt⁷,
Andrej Schwarz⁷ &
…
Friedhelm Schwenker⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8869))

Included in the following conference series:

IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction

783 Accesses
1 Altmetric

Abstract

Modern computing systems are usually equipped with various input devices such as microphones or cameras, and hence the user of such a system can easily be identified. User identification is important in many human computer interaction (HCI) scenarios, such as speech recognition, activity recognition, transcription of meeting room data or affective computing. Here personalized models may significantly improve the performance of the overall recognition system. This paper deals with audio-visual user identification. The main processing steps are segmentation of the relevant parts from video and audio streams, extraction of meaningful features and construction of the overall classifier and fusion architectures. The proposed system has been evaluated on the MOBIO dataset, a benchmark database consisting of real-world recordings collected from mobile devices, e.g. cell-phones. Recognition rates of up to 92 % could be achieved for the proposed audio-visual classifier system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002)
Article Google Scholar
Duc, B., Fischer, S., Bigun, J.: Face authentication with Gabor information on deformable graphs. IEEE Trans. Image Process. 8(4), 504–516 (1999)
Article Google Scholar
Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recogn. Lett. 28(11), 1368–1382 (2007)
Article Google Scholar
Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(5), 771–780 (1999)
Google Scholar
Fröba, B., Ernst, A.: Face detection with the modified census transform. In: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, FGR 2004, pp. 91–96. IEEE Computer Society, Washington, DC (2004)
Google Scholar
Glodek, M., et al.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 359–368. Springer, Heidelberg (2011)
Chapter Google Scholar
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst, October 2007
Google Scholar
Jain, A., Hong, L., Pankanti, S., Bolle, R.: An identity-authentication system using fingerprints. Proc. IEEE 85(9), 1365–1388 (1997)
Article Google Scholar
Jain, A., Ross, A.: Learning user-specific parameters in a multibiometric system. In: Proceedings of the International Conference on Image Processing, pp. 57–60 (2002)
Google Scholar
Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: De Marsico, M., Tabbone, A., Fred, A. (eds.) Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 671–678. SciTePress, Vienna (2014)
Google Scholar
Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665 (2014)
Google Scholar
Küblbeck, B.F.C.: Robust face detection at video frame rate based on edge orientation features. In: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (2002)
Google Scholar
Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37, 1641–1648 (1989)
Article Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Google Scholar
Matějka, P., Schwarz, P., Hermanský, H., Černocký, J.H.: Phoneme recognition using temporal patterns. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 198–205. Springer, Heidelberg (2003)
Chapter Google Scholar
McCool, C., Marcel, S., Hadid, A., Pietikainen, M., Matejka, P., Cernocky, J., Poh, N., Kittler, J., Larcher, A., Levy, C., Matrouf, D., Bonastre, J.F., Tresadern, P., Cootes, T.: Bi-modal person recognition on a mobile phone: using mobile phone data. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 635–640, (July 2012)
Google Scholar
Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction. J. Multimodal User Interfaces 6(3–4), 117–141 (2012)
Article Google Scholar
Schwarz, P.: Phoneme recognition based on long temporal context. Technical report, University of Brno, Faculty of Information Technology BUT (2009)
Google Scholar
Schwenker, F., Sachs, A., Palm, G., Kestler, H.A.: Orientation histograms for face recognition. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 253–259. Springer, Heidelberg (2006)
Chapter Google Scholar
Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-Oz data collection for perception and interaction in multi-user environments. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 2014–2017 (2006)
Google Scholar
Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)
Article Google Scholar
Zabih, R., Woodfill, J.: Non-parametric local transforms for computing visual correspondence. In: Eklundh, J.O. (ed.) ECCV 1994. Lecture Notes in Computer Science, vol. 801, pp. 151–158. Springer, Heidelberg (1994)
Google Scholar

Download references

Acknowledgement

The work has been partially supported by Transregional Collaborative Research Center SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG) and by a scholarship of the Landesgraduiertenförderung Baden-Württemberg at Ulm University (M. Kächele).

Author information

Authors and Affiliations

Institute of Neural Information Processing, Ulm University, 89069, Ulm, Germany
Markus Kächele, Sascha Meudt, Andrej Schwarz & Friedhelm Schwenker

Authors

Markus Kächele
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Meudt
View author publications
You can also search for this author in PubMed Google Scholar
Andrej Schwarz
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Friedhelm Schwenker .

Editor information

Editors and Affiliations

University of Ulm, Universität Ulm, Ulm, Germany
Friedhelm Schwenker
University of Southern California, Playa Vista, California, USA
Stefan Scherer
University of Southern California, Playa Vista, California, USA
Louis-Philippe Morency

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kächele, M., Meudt, S., Schwarz, A., Schwenker, F. (2015). Audio-Visual User Identification in HCI Scenarios. In: Schwenker, F., Scherer, S., Morency, LP. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2014. Lecture Notes in Computer Science(), vol 8869. Springer, Cham. https://doi.org/10.1007/978-3-319-14899-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-14899-1_11
Published: 04 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14898-4
Online ISBN: 978-3-319-14899-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics