Skip to main content

Audio-Visual User Identification in HCI Scenarios

  • Conference paper
  • First Online:
Book cover Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction (MPRSS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8869))

Abstract

Modern computing systems are usually equipped with various input devices such as microphones or cameras, and hence the user of such a system can easily be identified. User identification is important in many human computer interaction (HCI) scenarios, such as speech recognition, activity recognition, transcription of meeting room data or affective computing. Here personalized models may significantly improve the performance of the overall recognition system. This paper deals with audio-visual user identification. The main processing steps are segmentation of the relevant parts from video and audio streams, extraction of meaningful features and construction of the overall classifier and fusion architectures. The proposed system has been evaluated on the MOBIO dataset, a benchmark database consisting of real-world recordings collected from mobile devices, e.g. cell-phones. Recognition rates of up to 92 % could be achieved for the proposed audio-visual classifier system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chibelushi, C., Deravi, F., Mason, J.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002)

    Article  Google Scholar 

  2. Duc, B., Fischer, S., Bigun, J.: Face authentication with Gabor information on deformable graphs. IEEE Trans. Image Process. 8(4), 504–516 (1999)

    Article  Google Scholar 

  3. Faraj, M.I., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recogn. Lett. 28(11), 1368–1382 (2007)

    Article  Google Scholar 

  4. Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(5), 771–780 (1999)

    Google Scholar 

  5. Fröba, B., Ernst, A.: Face detection with the modified census transform. In: Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, FGR 2004, pp. 91–96. IEEE Computer Society, Washington, DC (2004)

    Google Scholar 

  6. Glodek, M., et al.: Multiple classifier systems for the classification of audio-visual emotional states. In: D’Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part II. LNCS, vol. 6975, pp. 359–368. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst, October 2007

    Google Scholar 

  8. Jain, A., Hong, L., Pankanti, S., Bolle, R.: An identity-authentication system using fingerprints. Proc. IEEE 85(9), 1365–1388 (1997)

    Article  Google Scholar 

  9. Jain, A., Ross, A.: Learning user-specific parameters in a multibiometric system. In: Proceedings of the International Conference on Image Processing, pp. 57–60 (2002)

    Google Scholar 

  10. Kächele, M., Glodek, M., Zharkov, D., Meudt, S., Schwenker, F.: Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In: De Marsico, M., Tabbone, A., Fred, A. (eds.) Proceedings of the International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 671–678. SciTePress, Vienna (2014)

    Google Scholar 

  11. Kächele, M., Schwenker, F.: Cascaded fusion of dynamic, spatial, and textural feature sets for person-independent facial emotion recognition. In: Proceedings of the International Conference on Pattern Recognition (ICPR), pp. 4660–4665 (2014)

    Google Scholar 

  12. Küblbeck, B.F.C.: Robust face detection at video frame rate based on edge orientation features. In: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (2002)

    Google Scholar 

  13. Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37, 1641–1648 (1989)

    Article  Google Scholar 

  14. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  15. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  16. Matějka, P., Schwarz, P., Hermanský, H., Černocký, J.H.: Phoneme recognition using temporal patterns. In: Matoušek, V., Mautner, P. (eds.) TSD 2003. LNCS (LNAI), vol. 2807, pp. 198–205. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  17. McCool, C., Marcel, S., Hadid, A., Pietikainen, M., Matejka, P., Cernocky, J., Poh, N., Kittler, J., Larcher, A., Levy, C., Matrouf, D., Bonastre, J.F., Tresadern, P., Cootes, T.: Bi-modal person recognition on a mobile phone: using mobile phone data. In: IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 635–640, (July 2012)

    Google Scholar 

  18. Scherer, S., Glodek, M., Layher, G., Schels, M., Schmidt, M., Brosch, T., Tschechne, S., Schwenker, F., Neumann, H., Palm, G.: A generic framework for the inference of user states in human computer interaction. J. Multimodal User Interfaces 6(3–4), 117–141 (2012)

    Article  Google Scholar 

  19. Schwarz, P.: Phoneme recognition based on long temporal context. Technical report, University of Brno, Faculty of Information Technology BUT (2009)

    Google Scholar 

  20. Schwenker, F., Sachs, A., Palm, G., Kestler, H.A.: Orientation histograms for face recognition. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS (LNAI), vol. 4087, pp. 253–259. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Strauß, P.M., Hoffmann, H., Minker, W., Neumann, H., Palm, G., Scherer, S., Schwenker, F., Traue, H., Walter, W., Weidenbacher, U.: Wizard-of-Oz data collection for perception and interaction in multi-user environments. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 2014–2017 (2006)

    Google Scholar 

  22. Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: LSTM-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)

    Article  Google Scholar 

  23. Zabih, R., Woodfill, J.: Non-parametric local transforms for computing visual correspondence. In: Eklundh, J.O. (ed.) ECCV 1994. Lecture Notes in Computer Science, vol. 801, pp. 151–158. Springer, Heidelberg (1994)

    Google Scholar 

Download references

Acknowledgement

The work has been partially supported by Transregional Collaborative Research Center SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation (DFG) and by a scholarship of the Landesgraduiertenförderung Baden-Württemberg at Ulm University (M. Kächele).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Friedhelm Schwenker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kächele, M., Meudt, S., Schwarz, A., Schwenker, F. (2015). Audio-Visual User Identification in HCI Scenarios. In: Schwenker, F., Scherer, S., Morency, LP. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2014. Lecture Notes in Computer Science(), vol 8869. Springer, Cham. https://doi.org/10.1007/978-3-319-14899-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14899-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14898-4

  • Online ISBN: 978-3-319-14899-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics