Skip to main content
Log in

Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion

Multimedia Tools and Applications Aims and scope Submit manuscript


Although there is an abundance of current research on facial recognition, it still faces significant challenges that are related to variations in factors such as aging, poses, occlusions, resolution, and appearances. In this paper, we propose a Multi-feature Deep Learning Network (MDLN) architecture that uses modalities from the facial and periocular regions, with the addition of texture descriptors to improve recognition performance. Specifically, MDLN is designed as a feature-level fusion approach that correlates between the multimodal biometrics data and texture descriptor, which creates a new feature representation. Therefore, the proposed MLDN model provides more information via the feature representation to achieve better performance, while overcoming the limitations that persist in existing unimodal deep learning approaches. The proposed model has been evaluated on several public datasets and through our experiments, we proved that our proposed MDLN has improved biometric recognition performances under challenging conditions, including variations in illumination, appearances, and pose misalignments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. Ahmad MI, Woo WL, Dlay S (2016) Non-stationary feature fusion of face and palmprint multimodal biometrics. Neurocomputing 177:49–61.

    Article  Google Scholar 

  2. Ahuja K, Islam R, Barbhuiya FA, Dey K (2017) Convolutional neural networks for ocular smartphone-based biometrics. Pattern Recogn Lett 91:17–26.

    Article  Google Scholar 

  3. BBC News. In: BBC.

  4. Bharati MH, Liu JJ, MacGregor JF (2004) Image texture analysis: methods and comparisons. Chemom Intell Lab Syst 72:57–71.

    Article  Google Scholar 

  5. Cao Z, Yin Q, Tang X, Sun J (2010) Face recognition with learning-based descriptor. In: Int Conf Comput Vis Pattern Recognit (CVPR). IEEE, San Francisco, CA, USA, p 2707–2714

  6. Cao Y, Steffey S, Jianbiao H, Xiao D, Tao C, Chen P, Müller H (2015) Medical image retrieval: a multimodal approach. Cancer Informat 13:125–136.

    Google Scholar 

  7. Castrillón-Santana M, Lorenzo-Navarro J, Ramón-Balmaseda E (2016) On using periocular biometric for gender classification in the wild. Pattern Recogn Lett 82:181–189.

    Article  Google Scholar 

  8. Chen Y, Yang J, Wang C, Liu N (2016) Multimodal biometrics recognition based on local fusion visual features and variational Bayesian extreme learning machine. Expert Syst Appl 64:93–103.

    Article  Google Scholar 

  9. Dalal N, Triggs W (2005) Histograms of oriented gradients for human detection. In: Int Conf Comput Vis Pattern Recognit (CVPR). IEEE, San Diego, CA, USA, p 886–893

  10. Delac K, Grgic M, Kos T (2006) Sub-image homomorphic filtering technique for improving facial identification under difficult illumination conditions. In: Int Conf Syst, Signals Image Process. Budapest, Hungary, p 95–98

  11. Devasena CL, Revathí R, Hemalatha M (2011) Video surveillance systems - a survey. Int J Comput Sci 8:635–642

    Google Scholar 

  12. Elhamifar E, Vidal R (2011) Robust classification using structured sparse representation. In: Int Conf Comput Vis Pattern Recognit (CVPR). IEEE, Colorado Springs, CO, USA, p 1873–1879

  13. Fan CN, Zhang FY (2011) Homomorphic filtering based illumination normalization method for face recognition. Pattern Recogn Lett 32:1468–1479.

    Article  Google Scholar 

  14. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Int Conf Comput Vis Pattern Recognit (CVPR). IEEE, Las Vegas, Nevada, USA, p 1933–1941

  15. Goswami G, Mittal P, Majumdar A, Vatsa M, Singh R (2016) Group sparse representation based classification for multi-feature multimodal biometrics. Inf Fusion 32:3–12.

    Article  Google Scholar 

  16. Goswami G, Singh R, Vatsa M, Majumdar A (2017) Kernel group sparse representation based classifier for multimodal biometrics. In: Int Joint Conf Neural Networks. IEEE, Anchorage, AK, USA, p 2894–2901

  17. Hayat M, Bennamoun M, An S (2015) Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell 37:713–727.

    Article  Google Scholar 

  18. Hayat M, Khan SH, Bennamoun M (2017) Empowering simple binary classifiers for image set based face recognition. Int J Comput Vis 123:479–498.

    Article  MathSciNet  Google Scholar 

  19. He Q, Zhang C, Liu DC (2015) Nonlinear image enhancement by self-adaptive sigmoid function. Int J Signal Process Image Process Pattern Recognit 8:319–328.

    Google Scholar 

  20. Hu G, Yang Y, Yi D, Kittler J, Christmas W, Li SZ, Hospedales T (2015) When face recognition meets with deep learning: An evaluation of convolutional neural networks for face recognition. In: Int Conf Comput Vis Workshop (ICCVW). IEEE, Santiago, Chile, p 142–150

  21. Internet Movie Database. In: IMDB.

  22. Jagadiswary D, Saraswady D (2016) Biometric authentication using fused multimodal biometric. Procedia Comput Sci 85:109–116.

    Article  Google Scholar 

  23. Jain AK, Nandakumar K, Ross A (2016) 50 years of biometric research: accomplishments, challenges, and opportunities. Pattern Recogn Lett 79:80–105.

    Article  Google Scholar 

  24. Kafai M, An L, Bhanu B (2014) Reference face graph for face recognition. IEEE Trans Inf Forensics Secur 9:2132–2143.

    Article  Google Scholar 

  25. Kahou SE, Bouthillier X, Lamblin P, Al E (2016) EmoNets: multimodal deep learning approaches for emotion recognition in video. J Multimodal User Interfaces 10:99–111.

    Article  Google Scholar 

  26. Karpathy A, Joulin A, Fei-Fei L (2014) Deep fragment embeddings for bidirectional image sentence mapping. In: Int Conf Neural Info Process Syst. ACM, Montreal, Canada, p 1889–1897

  27. Kasar MM, Bhattacharyya D, Kim T-H (2016) Face recognition using neural network: a review. Int J Secur Appl 10:81–100.

    Google Scholar 

  28. Kumar N, Berg AC, Belhumeur PN, Nayar SK (2009) Attribute and simile classifiers for face verification. In: Int Conf Comput Vis (ICCV). IEEE, Kyoto, Japan, p 365–372

  29. Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Int Conf Multimodal Interaction. ACM, Seattle, Washington, USA, p 503–510

  30. Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network approach for face detection. In: Int Conf Comput Vis Pattern Recognit (CVPR). IEEE, Boston, MA, USA, p 5325–5334

  31. Liu Y, Guo Y, Georgiou T, Lew MS (2018) Fusion that matters: convolutional fusion networks for visual recognition. Multimed Tools Appl.

  32. Lumini A, Nanni L (2017) Overview of the combination of biometric matchers. Inf Fusion 33:71–85.

    Article  Google Scholar 

  33. Martinez A, Benavente R (1998) The AR face database, Barcelona

  34. Min R, Kose N, Dugelay J-L (2014) KinectFaceDB: a Kinect face database for face recognition. IEEE Trans Syst Man, Cybern Syst 44:1534–1548.

    Article  Google Scholar 

  35. Mokhayeri F, Granger E, Bilodeau G (2015) Synthetic face generation under various operational conditions in video surveillance. In: Int Conf Image Process (ICIP). IEEE, Quebec City, QC, Canada, p 4052–4056

  36. Naver News. In: Naver.

  37. Ng HW, Winkler S (2014) A data-driven approach to cleaning large face datasets. In: Int Conf Image Process (ICIP). IEEE, p 343–347

  38. Nigam I, Vatsa M, Singh R (2015) Ocular biometrics: a survey of modalities and fusion approaches. Inf Fusion 26:1–35.

    Article  Google Scholar 

  39. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24:971–987.

    Article  MATH  Google Scholar 

  40. Padole CN, Proenca H (2012) Periocular recognition: Analysis of performance degradation factors. In: IAPR Int Conf Biometrics (ICB). IEEE, New Delhi, India, p 439–445

  41. Park U, Jillela RR, Ross A, Jain AK (2009) Periocular biometrics in the visible spectrum: A feasibility study. In: Int Conf Biometrics: Theory, Appl, Syst (BTAS). IEEE, p 1–6

  42. Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. In: British Machine Vision Conf. p 1–12

  43. Pietikäinen M, Hadid A, Zhao G, Ahonen T (2011) Local binary patterns for still images. In: Computer vision using local binary patterns. Springer, Berlin, pp 1689–1699

    Chapter  Google Scholar 

  44. Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proc Conf Empirical Methods in Natural Language Process. Lisbon, Portugal, p 2539–2544

  45. Raghavendra R, Busch C (2016) Learning deeply coupled autoencoders for smartphone based robust periocular verification. In: Int Conf Image Process (ICIP). IEEE, Phoenix, Arizona, USA, p 325–329

  46. Ramachandram D, Taylor GW (2017) Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process Mag 34:96–108

    Article  Google Scholar 

  47. Ranjan R, Sankaranarayanan S, Castillo CD, Chellappa R (2017) An all-in-one convolutional neural network for face analysis. In: Int Conf Automatic Face and Gesture Recognit. IEEE, Washington, DC, USA, p 17–24

  48. Ross A, Jain AK (2004) Multimodal biometrics: an overview. In: European Signal Process Conf. Vienna, Austria, p 1221–1224

  49. Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: A unified embedding for face recognition and clustering. In: Int Conf Comput Vis Pattern Recognit (CVPR). IEEE, Boston, MA, USA, p 815–823

  50. Shahamat H, Pouyan A (2014) Face recognition under large illumination variations using homomorphic filtering in spatial domain. J Vis Commun Image Represent 25:970–977

    Article  Google Scholar 

  51. Shekhar S, Patel VM, Nasrabadi NM, Chellappa R (2014) Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell 36:113–126.

    Article  Google Scholar 

  52. Simonyan K, Zisserman A (2014) Two-Stream convolutional networks for action recognition in videos. arXiv Prepr. 568–576

  53. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv Prepr. 1–14

  54. Srivastava N, Salakhutdinov R (2012) Learning representations for multimodal data with deep belief nets. In: Int Conf Mach Learning Workshop. Edinburgh, Scotland, UK

  55. Štruc V, Pavešić N (2010) The complete Gabor-fisher classifier for robust face recognition. EURASIP J Adv Signal Process 2010:1–26.

    MATH  Google Scholar 

  56. Tan X, Triggs B (2010) Recognition under difficult lighting conditions. IEEE Trans Image Process 19:1635–1650.

    Article  MathSciNet  MATH  Google Scholar 

  57. Tensorflow Library. In: TensorFlow.

  58. Tiong LCO, Kim ST, Ro YM (2017) Multimodal face biometrics by using convolutional neural networks. J Korea Multimed Soc 20:170–178.

    Article  Google Scholar 

  59. Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. In: Int Conf Comput Vis Pattern Recognit (CVPR). IEEE, Colorado Springs, CO, USA, p 529–534

  60. Woodard DL, Pundlik SJ, Lyle JR, Miller PE (2010) Periocular region appearance cues for biometric identification. In: Int Conf Comput Vis Pattern Recognit Workshop (CVPRW). IEEE, San Francisco, CA, USA, p 162–169

  61. Wu X, He R, Sun Z, Tan T (2018) A light CNN for deep face representation with noisy labels. IEEE Trans Inf Forensics Secur 13:2884–2896.

    Article  Google Scholar 

  62. Xu Y, Lu Y (2015) Adaptive weighted fusion: a novel fusion approach for image classification. Neurocomputing 168:566–574.

    Article  Google Scholar 

  63. Xu Y, Li Z, Pan JS, Yang JY (2013) Face recognition based on fusion of multi-resolution Gabor features. Neural Comput Appl 23:1251–1256.

    Article  Google Scholar 

  64. Yang M, Zhang D, Feng X (2011) Fisher discrimination dictionary learning for sparse representation. In: Int Conf Comput Vis (ICCV). IEEE, Barcelona, Spain, p 543–550

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yong Man Ro.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tiong, L.C.O., Kim, S.T. & Ro, Y.M. Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion. Multimed Tools Appl 78, 22743–22772 (2019).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: