Voxel-based 3D face reconstruction and its application to face recognition using sequential deep learning

  • Sahil SharmaEmail author
  • Vijay Kumar


In this paper, a novel 3D face reconstruction technique is proposed along with a sequential deep learning-based framework for face recognition. It uses the voxels generated from the voxelization process. It uses the reflection principle for generating the reconstructed point in 3D using the mid-face plane. From the reconstructed face, a sequential deep learning framework is developed to recognize gender, emotion, occlusion, and person. The developed framework utilizes the concepts of variational autoencoders, bidirectional long short-term memory, and triplet loss training. The sequential deep learning model extracts and refines the reconstructed voxels by generating deep features. The support vector machine is applied to deep features for the final prediction. The proposed 3D face recognition system is compared with the three well-known deep learning approaches over three occluded datasets. Experimental results show that the proposed 3D face recognition technique is invariant to occlusion and facial expression. The proposed technique recognizes the gender with accuracy of 97.28%, 92.12%, and 94.44%, emotion with accuracy of 94.57%, 87.78%, and 89.95%, occlusion with accuracy of 94.02%, 81.26%, and 89.85% and person face with accuracy of 90.01%, 78.21%, and 85.68% for Bosphorus, UMBDB and KinectFaceDB datasets respectively. The proposed framework performs better than state-of-the-art approaches in terms of computational time as well as face recognition accuracy.


Face reconstruction Voxel Sequential deep learning Face recognition Gender Emotion Occlusion 



  1. 1.
    Zhang X, Gao Y (2009) Face recognition across pose: a review. Pattern Recogn 42(11):2876–2896CrossRefGoogle Scholar
  2. 2.
    Bowyer KW, Chang K, Flynn P (2006) A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition. Comput Vis Image Underst 101(1):1–15CrossRefGoogle Scholar
  3. 3.
    Xu C, Wang Y, Tan T and Quan L (2004) Depth vs. intensity: which is more important for face recognition?. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. (Vol. 1, pp. 342-345). IEEEGoogle Scholar
  4. 4.
    Stoykova E, Ayd A, Benzie P, Grammalidis N, Malassiotis S, Ostermann J, Piekh S, Sainov V, Theobalt C, Thevar T, Zabulis X (2007) 3-D time-varying scene capture technologies—a survey. IEEE Transactions on Circuits and Systems for Video Technology 17(11):1568–1586CrossRefGoogle Scholar
  5. 5.
    Patil H, Kothari A, Bhurchandi K (2015) 3-D face recognition: features, databases, algorithms and challenges. Artif Intell Rev 44(3):393–441CrossRefGoogle Scholar
  6. 6.
    Kaufman A, Cohen D, Yagel R (1993) Volume graphics. Computer 26(7):51–64CrossRefGoogle Scholar
  7. 7.
    Kingma DP and Welling M, (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.Google Scholar
  8. 8.
    MacKay DJ and Mac Kay DJ (2003) Information theory, inference and learning algorithms. Cambridge university press.Google Scholar
  9. 9.
    Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5–6):602–610CrossRefGoogle Scholar
  10. 10.
    Schroff F, Kalenichenko D and Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).Google Scholar
  11. 11.
    Heisele B, Ho P and Poggio T, (2001) Face recognition with support vector machines: global versus component-based approach. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 (Vol. 2, pp. 688-694). IEEEGoogle Scholar
  12. 12.
    Da Costa DM, Peres SM, Lima CA and Mustaro P (2015) Face recognition using support vector machine and multiscale directional image representation methods: a comparative study. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEEGoogle Scholar
  13. 13.
    Haq EU, Huarong X and Khattak MI (2017) Face recognition by SVM using local binary patterns. In 2017 14th Web Information Systems and Applications Conference (WISA) (pp. 172-175). IEEEGoogle Scholar
  14. 14.
    Scholkopf B, Sung KK, Burges CJ, Girosi F, Niyogi P, Poggio T, Vapnik V (1997) Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans Signal Process 45(11):2758–2765CrossRefGoogle Scholar
  15. 15.
    Liu F, Zeng D, Li J, Zhao QJ (2017) On 3D face reconstruction via cascaded regression in shape space. Frontiers of Information Technology & Electronic Engineering 18(12):1978–1990CrossRefGoogle Scholar
  16. 16.
    Tran L and Liu X (2018) Nonlinear 3D face morphable model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7346-7355).Google Scholar
  17. 17.
    Richardson E, Sela M and Kimmel R, (2016) 3D face reconstruction by learning from synthetic data. In 2016 Fourth International Conference on 3D Vision (3DV) (pp. 460-469). IEEE.Google Scholar
  18. 18.
    Dou P, Wu Y, Shah SK, Kakadiaris IA (2018) Monocular 3D facial shape reconstruction from a single 2D image with coupled-dictionary learning and sparse coding. Pattern Recogn 81:515–527CrossRefGoogle Scholar
  19. 19.
    Feng M, Zulqarnain Gilani S, Wang Y and Mian A (2018) 3D face reconstruction from light field images: a model-free approach. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 501-518).Google Scholar
  20. 20.
    Kim H, Zollhöfer M, Tewari A, Thies J, Richardt C, Theobalt C (2017) Inversefacenet: deep single-shot inverse face rendering from a single image. arXiv preprint arXiv:1703.10956.Google Scholar
  21. 21.
    Jackson AS, Bulat A, Argyriou V and Tzimiropoulos G (2017) Large pose 3D face reconstruction from a single image via direct volumetric CNN regression. In proceedings of the IEEE international conference on computer vision (pp. 1031-1039)Google Scholar
  22. 22.
    Eigen D, Puhrsch C and Fergus R, (2014) Depth map prediction from a single image using a multi-scale deep network. In advances in neural information processing systems (pp. 2366-2374).Google Scholar
  23. 23.
    Saxena A, Chung SH, Ng AY (2008) 3-d depth reconstruction from a single still image. Int J Comput Vis 76(1):53–69CrossRefGoogle Scholar
  24. 24.
    Tulsiani S, Zhou T, Efros AA and Malik J (2017) Multi-view supervision for single-view reconstruction via differentiable ray consistency. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2626-2634)Google Scholar
  25. 25.
    Tatarchenko M, Dosovitskiy A and Brox T, (2017) Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. In proceedings of the IEEE international conference on computer vision (pp. 2088-2096)Google Scholar
  26. 26.
    Richardson E, Sela M, Or-El R and Kimmel R (2017) Learning detailed face reconstruction from a single image. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1259-1268).Google Scholar
  27. 27.
    Richardson E, Sela M and Kimmel R, (2016) 3D face reconstruction by learning from synthetic data. In 2016 fourth international conference on 3D vision (3DV) (pp. 460-469). IEEEGoogle Scholar
  28. 28.
    Roth J, Tong Y and Liu X, (2016) Adaptive 3D face reconstruction from unconstrained photo collections. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4197-4206)Google Scholar
  29. 29.
    Kemelmacher-Shlizerman I and Seitz SM, (2011) November. Face reconstruction in the wild. In 2011 international conference on computer vision (pp. 1746-1753). IEEEGoogle Scholar
  30. 30.
    Kemelmacher-Shlizerman I, Basri R (2011) 3D face reconstruction from a single image using a single reference face shape. IEEE Trans Pattern Anal Mach Intell 33(2):394–405CrossRefGoogle Scholar
  31. 31.
    Gecer B, Ploumpis S, Kotsia I and Zafeiriou S (2019) GANFIT: generative adversarial network fitting for high Fidelity 3D face reconstruction. arXiv preprint arXiv:1902.05978 Google Scholar
  32. 32.
    Zhu Z, Luo P, Wang X and Tang X (2013) Deep learning identity-preserving face space. In Proceedings of the IEEE International Conference on Computer Vision (pp. 113-120)Google Scholar
  33. 33.
    Tang Y, Salakhutdinov R and Hinton G (2012) Deep lambertian networks. arXiv preprint arXiv:1206.6445.Google Scholar
  34. 34.
    Richardson E, Sela M and Kimmel R, (2016) 3D face reconstruction by learning from synthetic data. In 2016 Fourth International Conference on 3D Vision (3DV) (pp. 460-469). IEEEGoogle Scholar
  35. 35.
    Richardson E, Sela M, Or-El R and Kimmel R (2017) Learning detailed face reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1259-1268)Google Scholar
  36. 36.
    Laine S, Karras T, Aila T, Herva A and Lehtinen J (2016) Facial performance capture with deep neural networks. arXiv preprint arXiv:1609.06536, 3 Google Scholar
  37. 37.
    Liu Z, Luo P, Wang X and Tang X, (2015) Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision (pp. 3730-3738)Google Scholar
  38. 38.
    Nair V, Susskind J and Hinton GE (2008) Analysis-by-synthesis by learning to invert generative black boxes. In International Conference on Artificial Neural Networks (pp. 971-981). Springer, Berlin, HeidelbergGoogle Scholar
  39. 39.
    Peng X, Feris RS, Wang X and Metaxas DN (2016) A recurrent encoder-decoder network for sequential face alignment. In European conference on computer vision(pp. 38-56). Springer, ChamCrossRefGoogle Scholar
  40. 40.
    Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203CrossRefGoogle Scholar
  41. 41.
    Savran A, Alyüz N, Dibeklioğlu H, Çeliktutan O, Gökberk B, Sankur B and Akarun L (2008) Bosphorus database for 3D face analysis. In European Workshop on Biometrics and Identity Management (pp. 47-56). Springer, Berlin, HeidelbergGoogle Scholar
  42. 42.
    Colombo A, Cusano C and Schettini R (2011) UMB-DB: a database of partially occluded 3D faces. In 2011 IEEE international conference on computer vision workshops (ICCV workshops) (pp. 2113-2119). IEEEGoogle Scholar
  43. 43.
    Min R, Kose N, Dugelay JL (2014) Kinectfacedb: a kinect database for face recognition. IEEE Transactions on Systems, Man, and Cybernetics: Systems 44(11):1534–1548CrossRefGoogle Scholar
  44. 44.
    Richardson E, Sela M, Or-El R and Kimmel R (2017) Learning detailed face reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1259-1268)Google Scholar
  45. 45.
    Cao X, Chen Z, Chen A, Chen X, Li S and Yu J (2018) Sparse photometric 3D face reconstruction guided by Morphable models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4635-4644)Google Scholar
  46. 46.
    Feng ZH, Huber P, Kittler J, Hancock P, Wu XJ, Zhao Q, Koppen P and Rätsch M, (2018) Evaluation of dense 3D reconstruction from 2D face images in the wild. In 2018 13th IEEE international conference on Automatic Face & Gesture Recognition (FG 2018) (pp. 780-786). IEEEGoogle Scholar
  47. 47.
    Xu Y, Zhu X, Li Z, Liu G, Lu Y, Liu H (2013) Using the original and ‘symmetrical face’ training samples to perform representation based two-step face recognition. Pattern Recogn 46(4):1151–1158CrossRefGoogle Scholar
  48. 48.
    Xu Y, Li X, Yang J, Zhang D (2014) Integrate the original face image and its mirror image for face recognition. Neurocomputing 131:191–199CrossRefGoogle Scholar
  49. 49.
    Xu Y, Fang X, Li X, Yang J, You J, Liu H, Teng S (2014) Data uncertainty in face recognition. IEEE transactions on cybernetics 44(10):1950–1961CrossRefGoogle Scholar
  50. 50.
    Singh S, Kasana SS (2018) Efficient classification of the hyperspectral images using deep learning. Multimed Tools Appl 77(20):27061–27074CrossRefGoogle Scholar
  51. 51.
    Celis D and Rao M (2019) Learning facial recognition biases through VAE latent representations. In proceedings of the 1st international workshop on fairness, accountability, and transparency in MultiMedia (pp. 26-32). ACMGoogle Scholar
  52. 52.
    Zhou X, Lin J, Jiang J and Chen S (2019) Learning a 3D gaze estimator with improved Itracker combined with bidirectional LSTM. In 2019 IEEE international conference on Multimedia and expo (ICME) (pp. 850-855). IEEEGoogle Scholar
  53. 53.
    Tian G, Yuan Y and Liu Y, (2019) Audio2Face: generating speech/face animation from single audio with attention-based bidirectional LSTM networks. In 2019 IEEE international conference on Multimedia & Expo Workshops (ICMEW) (pp. 366-371). IEEEGoogle Scholar
  54. 54.
    Li H, Xu H (2019) Video-based sentiment analysis with hvnLBP-TOP feature and bi-LSTM. In proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 9963-9964)CrossRefGoogle Scholar
  55. 55.
    Huang C, Li Y, Chen CL and Tang X (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans Pattern Anal Mach IntellGoogle Scholar
  56. 56.
    Tsai HH, Chang YC (2018) Facial expression recognition using a combination of multiple facial features and support vector machine. Soft Comput 22(13):4389–4405CrossRefGoogle Scholar
  57. 57.
    Richhariya B, Gupta D (2019) Facial expression recognition using iterative universum twin support vector machine. Appl Soft Comput 76:53–67CrossRefGoogle Scholar
  58. 58.
    Verma VK, Srivastava S, Jain T and Jain A (2019) Local invariant feature-based gender recognition from facial images. In soft computing for problem solving (pp. 869-878). Springer, SingaporeGoogle Scholar
  59. 59.
    Kar NB, Babu KS, Sangaiah AK, Bakshi S (2019) Face expression recognition system based on ripplet transform type II and least square SVM. Multimed Tools Appl 78(4):4789–4812CrossRefGoogle Scholar
  60. 60.
    Zhang YD, Zhang Y, Hou XX, Chen H, Wang SH (2018) Seven-layer deep neural network based on sparse autoencoder for voxelwise detection of cerebral microbleed. Multimed Tools Appl 77(9):10521–10538CrossRefGoogle Scholar
  61. 61.
    Zia MS, Hussain M, Jaffar MA (2018) A novel spontaneous facial expression recognition using dynamically weighted majority voting based ensemble classifier. Multimed Tools Appl 77(19):25537–25567CrossRefGoogle Scholar
  62. 62.
    Xiao Y, Wu J, Lin Z, Zhao X (2018) A deep learning-based multi-model ensemble method for cancer prediction. Comput Methods Prog Biomed 153:1–9CrossRefGoogle Scholar
  63. 63.
    Yu L, Zhou R, Tang L, Chen R (2018) A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data. Appl Soft Comput 69:192–202CrossRefGoogle Scholar
  64. 64.
    Sharma S and Kumar V (2018) Performance evaluation of 2D face recognition techniques under image processing attacks. Modern physics letters B, 32(19), p.1850212CrossRefGoogle Scholar
  65. 65.
    Sharma S and Kumar V (2019) Transfer learning in 2.5 D face image for occlusion presence and gender classification. In handbook of research on deep learning innovations and trends (pp. 97-113). IGI globalCrossRefGoogle Scholar
  66. 66.
    Liu Z, Zhang L, Pu J, Liu G and Liu S (2019) Using the original and symmetrical face test samples to perform two-step collaborative representation for face recognition. International journal of pattern recognition and artificial intelligence, 33(02), p.1956001CrossRefGoogle Scholar
  67. 67.
    Rajput SS and Arya KV (2019) A robust facial image super-resolution model via mirror-patch based neighbor representation. Multimedia tools and applications, pp.1-20.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Computer Science and Engineering DepartmentThapar Institute of Engineering and TechnologyPatialaIndia
  2. 2.Computer Science and Engineering DepartmentNational Institute of TechnologyHamirpurIndia

Personalised recommendations