Machine Learning for Multimodal Interaction

Volume 3869 of the series Lecture Notes in Computer Science pp 230-240

Estimating the Lecturer’s Head Pose in Seminar Scenarios – A Multi-view Approach

  • Michael VoitAffiliated withInteractive Systems Lab, Universität Karlsruhe (TH)
  • , Kai NickelAffiliated withInteractive Systems Lab, Universität Karlsruhe (TH)
  • , Rainer StiefelhagenAffiliated withInteractive Systems Lab, Universität Karlsruhe (TH)

* Final gross prices may vary according to local VAT.

Get Access


In this paper, we present a system to track the horizontal head orientation of a lecturer in a smart seminar room, which is equipped with several cameras. We automatically detect and track the face of the lecturer and use neural networks to classify his or her face orientation in each camera view. By combining the single estimates of the speaker’s head orientation from multiple cameras into one joint hypothesis, we improve overall head pose estimation accuracy. We conducted experiments on annotated recordings from real seminars. Using the proposed fully automatic system we are able to correctly determine the lecturer’s head pose in 59% of the time and for 8 orientation classes. In 92% of the time, the correct pose class or a neighbouring pose class (i.e. a 45 degree error) were estimated.