Using Audio, Visual, and Lexical Features in a Multi-modal Virtual Meeting Director

  • Marc Al-Hames
  • Benedikt Hörnler
  • Christoph Scheuermann
  • Gerhard Rigoll
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4299)


Multi-modal recordings of meetings provide the basis for meeting browsing and for remote meetings. However it is often not useful to store or transmit all visual channels. In this work we show how a virtual meeting director selects one of seven possible video modes. We then present several audio, visual, and lexical features for a virtual director. In an experimental section we evaluate the features, their influence on the camera selection, and the properties of the generated video stream. The chosen features all allow a real- or near real-time processing and can therefore not only be applied to offline browsing, but also for a remote meeting assistant.


Visual Feature Motion Vector Meeting Room Window Output Lexical Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Al-Hames, M., Dielmann, A., Gatica-Perez, D., Reiter, S., Renals, S., Rigoll, G., Zhang, D.: Multimodal integration for meeting group action segmentation and recognition. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 52–63. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Al-Hames, M., Hain, T., Cernocky, J., Schreiber, S., Poel, M., Muller, R., Marcel, S., van Leeuwen, D., Odobez, J.M., Ba, S., Bourlard, H., Cardinaux, F., Gatica-Perez, D., Janin, A., Motlicek, P., Reiter, S., Renals, S., van Rest, J., Rienks, R., Rigoll, G., Smith, K., Thean, A., Zemcik, P.: Audio-visual processing in meetings: Seven questions and current AMI answers. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 24–35. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The AMI meetings corpus. In: Proceedings of the Measuring Behavior 2005 symposium on Annotating and measuring Meeting Behavior (2005)Google Scholar
  4. 4.
    Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., Wooters, C.: The ICSI meeting corpus. In: Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2003)Google Scholar
  5. 5.
    Potucek, I., Sumec, S., Spanel, M.: Participant activity detection by hands and face movement tracking in the meeting room. In: Proceedings IEEE Computer Graphics International (CGI), pp. 632–635 (2004)Google Scholar
  6. 6.
    Pratt, W.K.: Digital image processing. John Wiley & Sons, Chichester (2001)CrossRefGoogle Scholar
  7. 7.
    Smith, K., Schreiber, S., Beran, V., Potúcek, I., Gatica-Perez, D.: A comparitive study of head tracking methods. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Waibel, A., Steusloff, H., Stiefelhagen, R., the CHIL Project Consortium: CHIL: Computers in the human interaction loop. In: Proceedings of the NIST ICASSP Meeting Recognition Workshop (2004)Google Scholar
  9. 9.
    Wallhoff, F., Zobl, M., Rigoll, G.: Action segmentation and recognition in meeting room scenarios. In: Proceedings IEEE International Conference on Image Processing (ICIP), Singapore (October 2004)Google Scholar
  10. 10.
    Wellner, P., Flynn, M., Guillemot, M.: Browsing recorded meetings with Ferret. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Yang, M.-H., Kriegman, D.J., Ahuja, N.: Detecting faces in images: A survey. IEEE Transasctions on Pattern Analysis and Machine Intelligence 24(1), 34–58 (2002)CrossRefGoogle Scholar
  12. 12.
    Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I., Lathoud, G.: Modeling individual and group actions in meetings: a two-layer hmm framework. In: Proceedings IEEE Workshop on Event Mining at the Conference on Computer Vision and Pattern Recognition (CVPR) (2004)Google Scholar
  13. 13.
    Zobl, M., Wallhoff, F., Rigoll, G.: Action recognition in meeting scenarios using global motion features. In: Ferryman, J. (ed.) Proceedings Fourth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS-ICVS), pp. 32–36 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marc Al-Hames
    • 1
  • Benedikt Hörnler
    • 1
  • Christoph Scheuermann
    • 1
  • Gerhard Rigoll
    • 1
  1. 1.Institute for Human-Machine-CommunicationTechnische Universität MünchenMunichGermany

Personalised recommendations