Multimedia Tools and Applications

, Volume 70, Issue 1, pp 495–523

Robust semi-automatic head pose labeling for real-world face video sequences


DOI: 10.1007/s11042-012-1352-1

Cite this article as:
Demirkus, M., Clark, J.J. & Arbel, T. Multimed Tools Appl (2014) 70: 495. doi:10.1007/s11042-012-1352-1


Automatic head pose estimation from real-world video sequences is of great interest to the computer vision community since pose provides prior knowledge for tasks, such as face detection and classification. However, developing pose estimation algorithms requires large, labeled real-world video databases on which computer vision systems can be trained and tested. Manual labeling of each frame is tedious, time consuming, and often difficult due to the high uncertainty in head pose angle estimate, particularly in unconstrained environments that include arbitrary facial expression, occlusion, illumination etc. To overcome these difficulties, a semi-automatic framework is proposed for labeling temporal head pose in real-world video sequences. The proposed multi-stage labeling framework first detects a subset of frames with distinct head poses over a video sequence, which is then manually labeled by the expert to obtain the ground truth for those frames. The proposed framework provides a continuous head pose label and corresponding confidence value over the pose angles. Next, the interpolation scheme over a video sequence estimates i) labels for the frames without manual labels and ii) corresponding confidence values for interpolated labels. This confidence value permits an automatic head pose estimation framework to determine the subset of frames to be used for further processing, depending on the labeling accuracy required. The experiments performed on an in-house, labeled, large, real-world face video database (which will be made publicly available) show that the proposed framework achieves 96.98 % labeling accuracy when manual labeling is only performed on 30 % of the video frames.


Semi-automatic labeling Real-world video sequence Head pose Automatic face tracking Bag-of-words Manifold 

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Centre for Intelligent MachinesMcGill UniversityMontréalCanada

Personalised recommendations