Face detection and tracking using hybrid margin-based ROI techniques


This study is to solve the problem of low accuracy and slow processing speed for real-time face detection and tracking systems. A margin-based region of interest approach with fixed and dynamic margin concepts is proposed to speed up the processing time. In addition, a hybrid system is developed to boost the accuracy and overcome the deficiency of the main detection algorithm. This approach consists of two routines, i.e., main and escape routines. Three algorithms are used independently as the main routine to evaluate the effectiveness of the proposed hybrid approach. These algorithms are Haar cascade, Joint cascade, and multitask convolutional neural networks. The escape routine based on template matching algorithm is designed to evaluate the effectiveness of the proposed hybrid approach and improve detection accuracy. Two RGB video datasets with diversity and variations in face poses, video backgrounds, illuminations, video resolutions, expressions, over exposed faces, and occlusions of people within various unseen environments have been used for experiments and evaluation. The experiment results confirm that the hybrid approach is capable of detecting and tracking faces in non-frontal orientation with better accuracy and faster processing speed, i.e., four times faster than the conventional full frame scanning techniques.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15


  1. 1.

    Yang, M.-H., Kriegman, D.J., Ahuja, N.: Detecting faces in image: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24, 34–58 (2002)

    Article  Google Scholar 

  2. 2.

    Zhang, C., Zhang, Z.: A survey of recent advances in face detection. Microsoft Res. 17, 1–17 (2010)

    Google Scholar 

  3. 3.

    Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild: past, present and future. Comput. Vis. Image Underst. 138, 1–24 (2015)

    Article  Google Scholar 

  4. 4.

    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. I–I (2001)

  5. 5.

    Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 109–122 (2014)

  6. 6.

    Zhang, K., Zhang, Z., Li, Z., Member, S., Qiao, Y., Member, S.: Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Process. Lett. 23, 1499–1503 (2016)

    Article  Google Scholar 

  7. 7.

    Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network approach for face detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5325–5334 (2015)

  8. 8.

    Dai, D., Tan, W., Zhan, H.: Understanding the feedforward artificial neural network model from the perspective of network flow. arXiv Prepr. arXiv1704.08068. (2017)

  9. 9.

    Ruder, S.: An Overview of multi-task learning in deep neural networks. arXiv Prepr. arXiv1706.05098. (2017)

  10. 10.

    Wei, L.-Y., Levoy, M.: Fast texture synthesis using tree-structured vector quantization. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques—SIGGRAPH’00, pp. 479–488 (2000)

  11. 11.

    Data/Code Section (2019). http://ailab.space/projects/multimodal-human-intention-perception/ Accessed Jan 2019

  12. 12.

    Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: benchmark and results. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1003–1011 (2016)

  13. 13.

    Salam, H., Séguier, R.: A survey on face modeling: building a bridge between face analysis and synthesis. Vis. Comput. 34, 289–319 (2018)

    Article  Google Scholar 

  14. 14.

    Zhao, W., Chellappa, R., Phillips, P.J.: Rosenfeld, a: face recognition: a literature survey. ACM Comput. Surv. 35, 399–458 (2003)

    Article  Google Scholar 

  15. 15.

    Bulbul, A., Cipiloglu, Z., Capin, T.: A color-based face tracking algorithm for enhancing interaction with mobile devices. Vis. Comput. 26, 311–323 (2010)

    Article  Google Scholar 

  16. 16.

    Kalal, Z., Mikolajczyk, K., Matas, J.: Face-TLD: Tracking-learning-detection applied to faces. In: Proceedings—International Conference on Image Processing, ICIP, pp. 3789–3792 (2010)

  17. 17.

    Singh, C., Walia, E., Mittal, N.: Robust two-stage face recognition approach using global and local features. Vis. Comput. 28, 1085–1098 (2012)

    Article  Google Scholar 

  18. 18.

    Kumar, N., Peter, A.C.B., Belhumeur, P.N., Abstract, S.K.N.: Attribute and simile classifiers for face verification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 365–372 (2009)

  19. 19.

    Fu, Y., Guo, G., Member, S.: Age synthesis and estimation via faces: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1955–1976 (2010)

    Article  Google Scholar 

  20. 20.

    Laurentini, A., Bottino, A.: Computer analysis of face beauty: a survey. Comput. Vis. Image Underst. 125, 184–199 (2014)

    Article  Google Scholar 

  21. 21.

    Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1424–1445 (2000)

    Article  Google Scholar 

  22. 22.

    Wang, Y., Zhang, L., Liu, Z., Hua, G., Wen, Z., Zhang, Z., Samaras, D.: Face relighting from a single image under arbitrary unknown lighting conditions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1968–1984 (2009)

    Article  Google Scholar 

  23. 23.

    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings 26th Annual Conference Computer Graphics Interaction Technology—SIGGRAPH’99, pp. 187–194 (1999)

  24. 24.

    Kemelmacher-Shlizerman, I., Shechtman, E., Garg, R., Seitz, S.M.: Exploring photobios. In: ACM SIGGRAPH 2011 papers on—SIGGRAPH’11, p. 1 (2011)

  25. 25.

    Wang, Z., Miao, Z., Jonathan, Wu, Wu, Q.M.J., Wan, Y., Tang, Z.: Low-resolution face recognition: a review. Vis. Comput. 30, 359–386 (2014)

    Article  Google Scholar 

  26. 26.

    Li, Stan Z., Long Zhu, Z.Z.: Statistical learning of multi-view face detection. In: European Conference on Computer Vision, pp. 67–81 (2002)

  27. 27.

    Jones, M.J., Jones, M.: Fast multi-view face detection. Mitsubishi Electr. Res. Lab TR-20003-96 3, 2 (2003)

    Google Scholar 

  28. 28.

    Chua, T., Zhao, Y., Kankanhalli, M.S.: Detection of human faces in compressed domain for video strati cation 1 introduction. Vis. Comput. 18, 121–133 (2002)

    Article  Google Scholar 

  29. 29.

    Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)

    Article  Google Scholar 

  30. 30.

    Bradski, G.: The OpenCV library. Dr Dobb’s J. Softw. Tools Prof. Program 25, 120–123 (2000)

    Google Scholar 

  31. 31.

    Wang, Y., Hu, S., Wu, S.: Object tracking based on huber loss function. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1563-1

    Article  Google Scholar 

  32. 32.

    Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3676–3684 (2015)

  33. 33.

    Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 650–657 (2017)

  34. 34.

    Park, J., Kang, D.: Unified convolutional neural network for direct facial keypoints detection. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1561-3

    Article  Google Scholar 

  35. 35.

    Dawoud, N.N., Samir, B.B., Janier, J.: Fast template matching method based optimized sum of absolute difference algorithm for face localization. Int. J. Comput. Appl. 18, 975–8887 (2011)

    Google Scholar 

  36. 36.

    Tan, T.K., Boon, C.S., Suzuki, Y.: Intra Prediction by Template Matching. In: International Conference on Image Processing, pp. 1–4 (2006)

  37. 37.

    Held, D., Levinson, J., Thrun, S., Savarese, S.: Robust real-time tracking combining 3D shape, color, and motion. Int. J. Rob. Res. 35, 1–28 (2015)

    Google Scholar 

  38. 38.

    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 583–596 (2015)

    Article  Google Scholar 

  39. 39.

    Wang, R., Dong, H., Han, T.X., Mei, L.: Robust tracking via monocular active vision for an intelligent teaching system. Vis. Comput. 32, 1379–1394 (2016)

    Article  Google Scholar 

  40. 40.

    Quan, W., Chen, J.X., Yu, N.: Robust object tracking using enhanced random ferns. Vis. Comput. 30, 351–358 (2014)

    Article  Google Scholar 

  41. 41.

    Quan, W., Jiang, Y., Zhang, J., Chen, J.X.: Robust object tracking with active context learning. Vis. Comput. 31, 1307–1318 (2015)

    Article  Google Scholar 

  42. 42.

    Gerónimo, D., Sappa, A.D., Ponsa, D., López, A.M.: 2D-3D-based on-board pedestrian detection system. Comput. Vis. Image Underst. 114, 583–595 (2010)

    Article  Google Scholar 

  43. 43.

    Xiao, J., Kanade, T., Cohn, J.F.: Robust full-motion recovery of head by dynamic templates and re-registration techniques. In: Proceedings—5th IEEE International Conference on Automatic Face Gesture Recognition, FGR 2002, pp. 163–169 (2002)

  44. 44.

    Rehman, B., Hong, O.W., Tan, A., Hong, C.: Hybrid Model with Margin-Based Real-Time Face Detection and Tracking. In: The 11th Multi-disciplinary International Workshop on Artificial Intelligence (MIWAI). Lecture Notes in Computer Science, pp. 360–369. Springer, Cham (2017)

  45. 45.

    Rehman, B., Hong, O.W., Tan, A., Hong, C.: Using margin-based region of interest technique with multi-task convolutional neural network and template matching for robust face detection and tracking system. In: Proceedings of 2nd International Conference on Imaging, Signal Processing and Communication (ICISPC) (2018)

  46. 46.

    Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks 8, 98–113 (1997)

    Article  Google Scholar 

  47. 47.

    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660 (2014)

  48. 48.

    Ranjan, R., Sankaranarayanan, S., Castillo, C.D., Chellappa, R.: An all-in-one convolutional neural network for face analysis. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 17–24 (2017)

  49. 49.

    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)

    MathSciNet  Article  Google Scholar 

  50. 50.

    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

  51. 51.

    Derpanis, K.G.: Relationship Between the Sum of Squared Difference (SSD) and Cross Correlation for Template Matching. York University, Toronto (2005)

    Google Scholar 

  52. 52.


Download references

Author information



Corresponding author

Correspondence to Bacha Rehman.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rehman, B., Ong, W.H., Tan, A.C.H. et al. Face detection and tracking using hybrid margin-based ROI techniques. Vis Comput 36, 633–647 (2020). https://doi.org/10.1007/s00371-019-01649-y

Download citation


  • Face detection
  • Joint cascade
  • Convolutional neural network
  • Haar cascade
  • Template matching
  • Region of interest
  • Hybrid model
  • Dynamic margin
  • Face tracking
  • Processing time