Face detection and tracking using hybrid margin-based ROI techniques

  • Bacha RehmanEmail author
  • Wee Hong Ong
  • Abby Chee Hong Tan
  • Trung Dung Ngo
Original Article


This study is to solve the problem of low accuracy and slow processing speed for real-time face detection and tracking systems. A margin-based region of interest approach with fixed and dynamic margin concepts is proposed to speed up the processing time. In addition, a hybrid system is developed to boost the accuracy and overcome the deficiency of the main detection algorithm. This approach consists of two routines, i.e., main and escape routines. Three algorithms are used independently as the main routine to evaluate the effectiveness of the proposed hybrid approach. These algorithms are Haar cascade, Joint cascade, and multitask convolutional neural networks. The escape routine based on template matching algorithm is designed to evaluate the effectiveness of the proposed hybrid approach and improve detection accuracy. Two RGB video datasets with diversity and variations in face poses, video backgrounds, illuminations, video resolutions, expressions, over exposed faces, and occlusions of people within various unseen environments have been used for experiments and evaluation. The experiment results confirm that the hybrid approach is capable of detecting and tracking faces in non-frontal orientation with better accuracy and faster processing speed, i.e., four times faster than the conventional full frame scanning techniques.


Face detection Joint cascade Convolutional neural network Haar cascade Template matching Region of interest Hybrid model Dynamic margin Face tracking Processing time 


Compliance with ethical standards

Conflict of interest

All the authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. 1.
    Yang, M.-H., Kriegman, D.J., Ahuja, N.: Detecting faces in image: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24, 34–58 (2002)CrossRefGoogle Scholar
  2. 2.
    Zhang, C., Zhang, Z.: A survey of recent advances in face detection. Microsoft Res. 17, 1–17 (2010)Google Scholar
  3. 3.
    Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild: past, present and future. Comput. Vis. Image Underst. 138, 1–24 (2015)CrossRefGoogle Scholar
  4. 4.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. I–I (2001)Google Scholar
  5. 5.
    Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 109–122 (2014)Google Scholar
  6. 6.
    Zhang, K., Zhang, Z., Li, Z., Member, S., Qiao, Y., Member, S.: Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Process. Lett. 23, 1499–1503 (2016)CrossRefGoogle Scholar
  7. 7.
    Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network approach for face detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5325–5334 (2015)Google Scholar
  8. 8.
    Dai, D., Tan, W., Zhan, H.: Understanding the feedforward artificial neural network model from the perspective of network flow. arXiv Prepr. arXiv1704.08068. (2017)Google Scholar
  9. 9.
    Ruder, S.: An Overview of multi-task learning in deep neural networks. arXiv Prepr. arXiv1706.05098. (2017)Google Scholar
  10. 10.
    Wei, L.-Y., Levoy, M.: Fast texture synthesis using tree-structured vector quantization. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques—SIGGRAPH’00, pp. 479–488 (2000)Google Scholar
  11. 11.
    Data/Code Section (2019). Accessed Jan 2019
  12. 12.
    Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: benchmark and results. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1003–1011 (2016)Google Scholar
  13. 13.
    Salam, H., Séguier, R.: A survey on face modeling: building a bridge between face analysis and synthesis. Vis. Comput. 34, 289–319 (2018)CrossRefGoogle Scholar
  14. 14.
    Zhao, W., Chellappa, R., Phillips, P.J.: Rosenfeld, a: face recognition: a literature survey. ACM Comput. Surv. 35, 399–458 (2003)CrossRefGoogle Scholar
  15. 15.
    Bulbul, A., Cipiloglu, Z., Capin, T.: A color-based face tracking algorithm for enhancing interaction with mobile devices. Vis. Comput. 26, 311–323 (2010)CrossRefGoogle Scholar
  16. 16.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Face-TLD: Tracking-learning-detection applied to faces. In: Proceedings—International Conference on Image Processing, ICIP, pp. 3789–3792 (2010)Google Scholar
  17. 17.
    Singh, C., Walia, E., Mittal, N.: Robust two-stage face recognition approach using global and local features. Vis. Comput. 28, 1085–1098 (2012)CrossRefGoogle Scholar
  18. 18.
    Kumar, N., Peter, A.C.B., Belhumeur, P.N., Abstract, S.K.N.: Attribute and simile classifiers for face verification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 365–372 (2009)Google Scholar
  19. 19.
    Fu, Y., Guo, G., Member, S.: Age synthesis and estimation via faces: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1955–1976 (2010)CrossRefGoogle Scholar
  20. 20.
    Laurentini, A., Bottino, A.: Computer analysis of face beauty: a survey. Comput. Vis. Image Underst. 125, 184–199 (2014)CrossRefGoogle Scholar
  21. 21.
    Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1424–1445 (2000)CrossRefGoogle Scholar
  22. 22.
    Wang, Y., Zhang, L., Liu, Z., Hua, G., Wen, Z., Zhang, Z., Samaras, D.: Face relighting from a single image under arbitrary unknown lighting conditions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1968–1984 (2009)CrossRefGoogle Scholar
  23. 23.
    Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings 26th Annual Conference Computer Graphics Interaction Technology—SIGGRAPH’99, pp. 187–194 (1999)Google Scholar
  24. 24.
    Kemelmacher-Shlizerman, I., Shechtman, E., Garg, R., Seitz, S.M.: Exploring photobios. In: ACM SIGGRAPH 2011 papers on—SIGGRAPH’11, p. 1 (2011)Google Scholar
  25. 25.
    Wang, Z., Miao, Z., Jonathan, Wu, Wu, Q.M.J., Wan, Y., Tang, Z.: Low-resolution face recognition: a review. Vis. Comput. 30, 359–386 (2014)CrossRefGoogle Scholar
  26. 26.
    Li, Stan Z., Long Zhu, Z.Z.: Statistical learning of multi-view face detection. In: European Conference on Computer Vision, pp. 67–81 (2002)Google Scholar
  27. 27.
    Jones, M.J., Jones, M.: Fast multi-view face detection. Mitsubishi Electr. Res. Lab TR-20003-96 3, 2 (2003)Google Scholar
  28. 28.
    Chua, T., Zhao, Y., Kankanhalli, M.S.: Detection of human faces in compressed domain for video strati cation 1 introduction. Vis. Comput. 18, 121–133 (2002)CrossRefzbMATHGoogle Scholar
  29. 29.
    Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)CrossRefGoogle Scholar
  30. 30.
    Bradski, G.: The OpenCV library. Dr Dobb’s J. Softw. Tools Prof. Program 25, 120–123 (2000)Google Scholar
  31. 31.
    Wang, Y., Hu, S., Wu, S.: Object tracking based on huber loss function. Vis. Comput. (2018). Google Scholar
  32. 32.
    Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3676–3684 (2015)Google Scholar
  33. 33.
    Jiang, H., Learned-Miller, E.: Face detection with the faster R-CNN. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 650–657 (2017)Google Scholar
  34. 34.
    Park, J., Kang, D.: Unified convolutional neural network for direct facial keypoints detection. Vis. Comput. (2018). Google Scholar
  35. 35.
    Dawoud, N.N., Samir, B.B., Janier, J.: Fast template matching method based optimized sum of absolute difference algorithm for face localization. Int. J. Comput. Appl. 18, 975–8887 (2011)Google Scholar
  36. 36.
    Tan, T.K., Boon, C.S., Suzuki, Y.: Intra Prediction by Template Matching. In: International Conference on Image Processing, pp. 1–4 (2006)Google Scholar
  37. 37.
    Held, D., Levinson, J., Thrun, S., Savarese, S.: Robust real-time tracking combining 3D shape, color, and motion. Int. J. Rob. Res. 35, 1–28 (2015)Google Scholar
  38. 38.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37, 583–596 (2015)CrossRefGoogle Scholar
  39. 39.
    Wang, R., Dong, H., Han, T.X., Mei, L.: Robust tracking via monocular active vision for an intelligent teaching system. Vis. Comput. 32, 1379–1394 (2016)CrossRefGoogle Scholar
  40. 40.
    Quan, W., Chen, J.X., Yu, N.: Robust object tracking using enhanced random ferns. Vis. Comput. 30, 351–358 (2014)CrossRefGoogle Scholar
  41. 41.
    Quan, W., Jiang, Y., Zhang, J., Chen, J.X.: Robust object tracking with active context learning. Vis. Comput. 31, 1307–1318 (2015)CrossRefGoogle Scholar
  42. 42.
    Gerónimo, D., Sappa, A.D., Ponsa, D., López, A.M.: 2D-3D-based on-board pedestrian detection system. Comput. Vis. Image Underst. 114, 583–595 (2010)CrossRefGoogle Scholar
  43. 43.
    Xiao, J., Kanade, T., Cohn, J.F.: Robust full-motion recovery of head by dynamic templates and re-registration techniques. In: Proceedings—5th IEEE International Conference on Automatic Face Gesture Recognition, FGR 2002, pp. 163–169 (2002)Google Scholar
  44. 44.
    Rehman, B., Hong, O.W., Tan, A., Hong, C.: Hybrid Model with Margin-Based Real-Time Face Detection and Tracking. In: The 11th Multi-disciplinary International Workshop on Artificial Intelligence (MIWAI). Lecture Notes in Computer Science, pp. 360–369. Springer, Cham (2017)Google Scholar
  45. 45.
    Rehman, B., Hong, O.W., Tan, A., Hong, C.: Using margin-based region of interest technique with multi-task convolutional neural network and template matching for robust face detection and tracking system. In: Proceedings of 2nd International Conference on Imaging, Signal Processing and Communication (ICISPC) (2018)Google Scholar
  46. 46.
    Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks 8, 98–113 (1997)CrossRefGoogle Scholar
  47. 47.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1653–1660 (2014)Google Scholar
  48. 48.
    Ranjan, R., Sankaranarayanan, S., Castillo, C.D., Chellappa, R.: An all-in-one convolutional neural network for face analysis. In: 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 17–24 (2017)Google Scholar
  49. 49.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  50. 50.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  51. 51.
    Derpanis, K.G.: Relationship Between the Sum of Squared Difference (SSD) and Cross Correlation for Template Matching. York University, Toronto (2005)Google Scholar
  52. 52.

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of ScienceUniversiti Brunei DarussalamBandar Seri BegawanBrunei Darussalam
  2. 2.The More-Than-One Robotics LaboratoryUniversity of Prince Edward IslandCharlottetownCanada

Personalised recommendations