Skip to main content

Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2006)

Abstract

The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms – and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction, and human-computer interaction. The audio-visual processing workpackage within AMI addresses the automatic recognition from audio, video, and combined audio-video streams, that have been recorded during meetings. In this article we describe the progress that has been made in the first two years of the project. We show how the large problem of audio-visual processing in meetings can be split into seven questions, like “Who is acting during the meeting?”. We then show which algorithms and methods have been developed and evaluated for the automatic answering of these questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ba, S.O., Odobez, J.M.: Evaluation of head pose tracking algorithm in indoor environments. In: Proceedings IEEE ICME (2005)

    Google Scholar 

  2. Ba, S.O., Odobez, J.M.: A rao-blackwellized mixed state particle filter for head pose tracking. In: Proceedings of the ACM-ICMI Workshop on MMMP (2005)

    Google Scholar 

  3. BANCA: Benchmark database, http://www.ee.surrey.ac.uk/banca

  4. Burger, S., MacLaren, V., Yu, H.: The ISL meeting corpus: The impact of meeting type on speech style. In: Proceedings ICSLP (2002)

    Google Scholar 

  5. Cardinaux, F., Sanderson, C., Bengio, S.: Face verification using adapted generative models. In: Int. Conf. on Automatic Face and Gesture Recognition (2004)

    Google Scholar 

  6. Cardinaux, F., Sanderson, C., Marcel, S.: Comparison of MLP and GMM classifiers for face verification on XM2VTS. In: Proc. IEEE AVBPA (2003)

    Google Scholar 

  7. Carletta, J., et al.: The AMI meetings corpus. In: Proc. Symposium on Annotating and measuring Meeting Behavior (2005)

    Google Scholar 

  8. Fapso, M., Schwarz, P., Szoke, I., Smrz, P., Schwarz, M., Cernocky, J., Karafiat, M., Burget, L.: Search engine for information retrieval from speech records. In: Proceedings Computer Treatment of Slavic and East European Languages (2005)

    Google Scholar 

  9. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning (1996)

    Google Scholar 

  10. Hain, T., Burget, L., Dines, J., Garau, G., Karafiat, M., Lincoln, M., McCowan, I., Moore, D., Wan, V., Ordelman, R., Renals, S.: The 2005 AMI system for the transcription of speech in meetings. In: Proc. of the NIST RT 2005s workshop (2005)

    Google Scholar 

  11. Hain, T., Dines, J., Garau, G., Karafiat, M., Moore, D., Wan, V., Ordelman, R., Renals, S.: Transcription of conference room meetings: an investigation. In: Proceedings Interspeech (2005)

    Google Scholar 

  12. Heylen, D., Nijholt, A., Reidsma, D.: Determining what people feel and think when interacting with humans and machines: Notes on corpus collection and annotation. In: Kreiner, J., Putcha, C. (eds.) Proceedings 1st California Conference on Recent Advances in Engineering Mechanics (2006)

    Google Scholar 

  13. Hradis, M., Juranek, R.: Real-time tracking of participants in meeting video. In: Proceedings CESCG (2006)

    Google Scholar 

  14. Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: ICSI meeting corpus. In: Proceedings IEEE ICASSP (2003)

    Google Scholar 

  15. Messer, K., Kittler, J., Sadeghi, M., Hamouz, M., Kostyn, A., Marcel, S., Bengio, S., Cardinaux, F., Sanderson, C., Poh, N., Rodriguez, Y., Czyz, J., et al.: Face authentication test on the BANCA database. In: Proceedings ICPR (2004)

    Google Scholar 

  16. Motlicek, P., Burget, L., Cernocky, J.: Non-parametric speaker turn segmentation of meeting data. In: Proceedings Eurospeech (2005)

    Google Scholar 

  17. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE TPAMI 22(12), 1424–1445 (2000)

    Google Scholar 

  18. Poppe, R., Heylen, D., Nijholt, A., Poel, M.: Towards real-time body pose estimation for presenters in meeting environments. In: Proceedings WSCG (2005)

    Google Scholar 

  19. Potucek, I., Sumec, S., Spanel, M.: Participant activity detection by hands and face movement tracking in the meeting room. In: Proceedings CGI (2004)

    Google Scholar 

  20. Rienks, R., Poppe, R., Heylen, D.: Differences in head orientation for speakers and listeners: Experiments in a virtual environment. Int. Journ. HCS (to appear)

    Google Scholar 

  21. Schwarz, P., Matějka, P., Černocký, J.: Hierarchical structures of neural networks for phoneme recognition. In: IEEE ICASSP (accepted, 2006)

    Google Scholar 

  22. Smith, K., Ba, S., Odobez, J., Gatica-Perez, D.: Evaluating multi-object tracking. In: Workshop on Empirical Evaluation Methods in Computer Vision (2005)

    Google Scholar 

  23. Smith, K., Ba, S., Odobez, J.M., Gatica-Perez, D.: Multi-person wander-visual-focus-of-attention tracking. Technical Report RR-05-80, IDIAP (2005)

    Google Scholar 

  24. Smith, K., Schreiber, S., Beran, V., Potúcek, I., Gatica-Perez, D.: A comparitive study of head tracking methods. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  25. Szöke, I., Schwarz, P., Matějka, P., Burget, L., Karafiát, M., Fapšo, M., Černocký, J.: Comparison of keyword spotting approaches for informal continuous speech. In: Proceedings Eurospeech (2005)

    Google Scholar 

  26. Torch, http://www.idiap.ch/~marcel/en/torch3/introduction.php

  27. NIST US: Spring 2004 (RT04S) and Spring 2005 (RT05S) Rich Transcription Meeting Recognition Evaluation Plan. Available at: http://www.nist.gov/

  28. Viola, P., Jones, M.: Robust real-time object detection. International Journal of Computer Vision (2002)

    Google Scholar 

  29. Waibel, A., Steusloff, H., Stiefelhagen, R., CHIL Project Consortium: CHIL: Computers in the human interaction loop. In: Proceedings of the NIST ICASSP Meeting Recognition Workshop (2004)

    Google Scholar 

  30. Wellner, P., Flynn, M., Guillemot, M.: Browsing recorded meetings with Ferret. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 12–21. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Al-Hames, M. et al. (2006). Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_3

Download citation

  • DOI: https://doi.org/10.1007/11965152_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69267-6

  • Online ISBN: 978-3-540-69268-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics