Abstract
The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domains makes cross-domain face verification a highly challenging problem for human examiners as well as computer vision algorithms. Previous approaches utilize either a two-step procedure (visible feature estimation and visible image reconstruction) or an input-level fusion technique, where different Stokes images are concatenated and used as a multi-channel input to synthesize the visible image given the corresponding polarimetric signatures. Although these methods have yielded improvements, we argue that input-level fusion alone may not be sufficient to realize the full potential of the available Stokes images. We propose a generative adversarial networks based multi-stream feature-level fusion technique to synthesize high-quality visible images from polarimetric thermal images. The proposed network consists of a generator sub-network, constructed using an encoder–decoder network based on dense residual blocks, and a multi-scale discriminator sub-network. The generator network is trained by optimizing an adversarial loss in addition to a perceptual loss and an identity preserving loss to enable photo realistic generation of visible images while preserving discriminative characteristics. An extended dataset consisting of polarimetric thermal facial signatures of 111 subjects is also introduced. Multiple experiments evaluated on different experimental protocols demonstrate that the proposed method achieves state-of-the-art performance. Code will be made available at https://github.com/hezhangsprinter.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Input level fusion can be regarded as an extreme case for low-level feature fusion, where low-level features (from shallow layers) often preserve edge information rather than semantic mid-level or high-level class-specific information (Zeiler and Fergus 2014).
Weights are not shared among each stream.
Feature map size (width and height) in each level is same.
Basically, this network is composed of one stream of the encoder part followed by the same decoder without multi-level pooling.
References
Berthelot, D., Schumm, T., & Metz, L. (2017). Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717.
Bodla, N., Zheng, J., Xu, H., Chen, J. C., Castillo, C., & Chellappa, R. (2017). Deep heterogeneous feature fusion for template-based face recognition. In 2017 IEEE winter conference on applications of computer vision (WACV) (pp. 586–595). IEEE.
Chen, J. C., Patel, V. M., & Chellappa, R. (2016). Unconstrained face verification using deep cnn features. In 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–9). IEEE.
Chen, J. C., Ranjan, R., Sankaranarayanan, S., Kumar, A., Chen, C. H., Patel, V. M., et al. (2017). Unconstrained still/video-based face verification with deep convolutional neural networks. International Journal of Computer Vision. https://doi.org/10.1007/s11263-017-1029-3.
Chen, X., Flynn, P. J., & Bowyer, K. W. (2005). Ir and visible light face recognition. Computer Vision and Image Understanding, 99(3), 332–358.
Creswell, A., & Bharath, A. A. (2016). Task specific adversarial cost function. arXiv preprint arXiv:1609.08661.
Di, X., Zhang, H., & Patel, V. M. (2019). Polarimetric thermal to visible face verification via attribute preserved synthesis. CoRR abs/1901.00889 arXiv:1901.00889.
Ding, H., Zhou, S. K., & Chellappa, R. (2017). Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017) (pp. 118–126). IEEE.
Espinosa-Duró, V., Faundez-Zanuy, M., & Mekyska, J. (2013). A new face database simultaneously acquired in visible, near-infrared and thermal spectrums. Cognitive Computation, 5(1), 119–135.
Gao, F., Shi, S., Yu, J., & Huang, Q. (2017). Composition-aided sketch-realistic portrait generation. arXiv preprint arXiv:1712.00899.
Gonzalez-Sosa, E., Vera-Rodriguez, R., Fierrez, J., & Patel, V. M. (2017a). Exploring body shape from mmw images for person recognition. IEEE Transactions on Information Forensics and Security, 12(9), 2078–2089.
Gonzalez-Sosa, E., Vera-Rodriguez, R., Fierrez, J., & Patel, V. M. (2017b). Millimetre wave person recognition: Hand-crafted vs. learned features. In ISBA (pp. 1–7)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In NIPS (pp. 2672–2680).
Gurton, K. P., Yuffa, A. J., & Videen, G. W. (2014). Enhanced facial recognition for thermal imagery using polarimetric imaging. Optics Letters, 39(13), 3857–3859.
He, R., Cao, J., Song, L., Sun, Z., & Tan, T. (2019). Cross-spectral face completion for nir-vis heterogeneous face recognition. arXiv preprint arXiv:1902.03565.
He, R., Wu, X., Sun, Z., & Tan, T. (2017). Wasserstein cnn: Learning invariant features for nir-vis face recognition. arXiv preprint arXiv:1708.02412.
Hu, S., Choi, J., Chan, A. L., & Schwartz, W. R. (2015). Thermal-to-visible face recognition using partial least squares. JOSA A, 32(3), 431–442.
Hu, S., Short, N. J., Riggan, B. S., Gordon, C., Gurton, K. P., Thielke, M., Gurram, P., & Chan, A. L. (2016). A polarimetric thermal database for face recognition research. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 119–126).
Huang, G., Liu, Z., Weinberger, K. Q., & van der Maaten, L. (2016). Densely connected convolutional networks. arXiv preprint arXiv:1608.06993.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of The 32nd international conference on machine learning (pp. 448–456).
Iranmanesh, S. M., Dabouei, A., Kazemi, H., & Nasrabadi, N. M. (2018). Deep cross polarimetric thermal-to-visible face recognition. ArXiv e-prints.
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR).
Jetchev, N., Bergmann, U., & Vollgraf, R. (2016). Texture synthesis with spatial generative adversarial networks. arXiv preprint arXiv:1611.08207.
Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision (pp. 694–711). Springer.
Karacan, L., Akata, Z., Erdem, A., & Erdem, E. (2016). Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv preprint arXiv:1612.00215.
Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Klare, B., & Jain, A. K. (2010). Heterogeneous face recognition: Matching nir to visible light images. In ICPR (pp. 1513–1516).
Klare, B. F., & Jain, A. K. (2013). Heterogeneous face recognition using kernel prototype similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1410–1422.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Lezama, J., Qiu, Q., & Sapiro, G. (2017). Not afraid of the dark: Nir-vis face recognition via cross-spectral hallucination and low-rank embedding. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6807–6816). IEEE.
Li, S., Yi, D., Lei, Z., & Liao, S. (2013). The casia nir-vis 2.0 face database. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 348–353).
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In ICML workshop on deep learning for audio, speech and language processing.
Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5188–5196).
Meyers, E., & Wolf, L. (2008). Using biologically inspired features for face processing. International Journal of Computer Vision, 76(1), 93–104.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807–814).
Nicolo, F., & Schmid, N. A. (2012). Long range cross-spectral face recognition: Matching swir against visible light images. IEEE Transactions on Information Forensics and Security, 7(6), 1717–1726.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In Proceedings of the British machine vision conference (BMVC).
Peng, C., Gao, X., Wang, N., Tao, D., Li, X., & Li, J. (2016). Multiple representations-based face sketch-photo synthesis. IEEE Transactions on Neural Networks and Learning Systems, 27(11), 2201–2215.
Peng, X., Feris, R. S., Wang, X., & Metaxas, D. N. (2016). A recurrent encoder–decoder network for sequential face alignment. In European conference on computer vision (pp. 38–56). Springer International Publishing.
Peng, X., Tang, Z., Yang, F., Feris, R., & Metaxas, D. (2018). Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation. arXiv preprint arXiv:1805.09707.
Peng, X., Yu, X., Sohn, K., Metaxas, D. N., & Chandraker, M. (2017). Reconstruction-based disentanglement for pose-invariant face recognition. In Proceedings of the IEEE international conference on computer vision.
Perera, P., Abavisani, M., & Patel, V. M. (2017). In2i: Unsupervised multi-image-to-image translation using generative adversarial networks. arXiv preprint arXiv:1711.09334.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Ranjan, R., Sankaranarayanan, S., Bansal, A., Bodla, N., Chen, J. C., Patel, V. M., et al. (2018). Deep learning for understanding faces: Machines may be just as good, or better, than humans. IEEE Signal Processing Magazine, 35(1), 66–83. https://doi.org/10.1109/MSP.2017.2764116.
Ranjan, R., Sankaranarayanan, S., Castillo, C. D., & Chellappa, R. (2017). An all-in-one convolutional neural network for face analysis. In 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017) (pp. 17–24). IEEE.
Riggan, B. S., Reale, C., & Nasrabadi, N. M. (2015). Coupled auto-associative neural networks for heterogeneous face recognition. IEEE Access, 3, 1620–1632. https://doi.org/10.1109/ACCESS.2015.2479620.
Riggan, B. S., Short, N. J., & Hu, S. (2016a). Optimal feature learning and discriminative framework for polarimetric thermal to visible face recognition. In 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–7). IEEE.
Riggan, B. S., Short, N. J., Hu, S., & Kwon, H. (2016b). Estimation of visible spectrum faces from polarimetric thermal faces. In 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS) (pp. 1–7). IEEE.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In NIPS (pp. 2226–2234).
Sarfraz, M. S., & Stiefelhagen, R. (2015). Deep perceptual mapping for thermal to visible face recognition. arXiv preprint arXiv:1507.02879.
Sarfraz, M. S., & Stiefelhagen, R. (2017). Deep perceptual mapping for cross-modal face recognition. International Journal of Computer Vision, 122(3), 426–438.
Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823).
Short, N., Hu, S., Gurram, P., Gurton, K., & Chan, A. (2015). Improving cross-modal face recognition using polarimetric imaging. Optics Letters, 40(6), 882–885. https://doi.org/10.1364/OL.40.000882.
Sindagi, V. A., & Patel, V. M. (2017). Generating high-quality crowd density maps using contextual pyramid cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1861–1870).
Song, L., Zhang, M., Wu, X., & He, R. (2018). Adversarial discriminative heterogeneous face recognition. In AAAI.
Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in neural information processing systems (pp. 1988–1996).
Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning GAN for pose-invariant face recognition. In Proceeding of IEEE computer vision and pattern recognition (CVPR).
Tyo, J. S., Goldstein, D. L., Chenault, D. B., & Shaw, J. A. (2006). Review of passive imaging polarimetry for remote sensing applications. Applied Optics, 45(22), 5453–5469.
Wang, L., Sindagi, V. A., & Patel, V. M. (2018). High-quality facial photo-sketch synthesis using multi-adversarial networks. In IEEE international conference on automatic face and gesture recognition.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE TIP, 13(4), 600–612.
Wu, X., He, R., Sun, Z., & Tan, T. (2018). A light cnn for deep face representation with noisy labels. IEEE Transactions on Information Forensics and Security, 13(11), 2884–2896.
Wu, X., Huang, H., Patel, V. M., He, R., & Sun, Z. (2018) Disentangled variational representation for heterogeneous face recognition. arXiv preprint arXiv:1809.01936.
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision (pp. 1395–1403).
Xu, H., Zheng, J., Alavi, A., & Chellappa, R. (2016). Learning a structured dictionary for video-based face recognition. In 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–9). IEEE.
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2017). Attngan: Fine-grained text to image generation with attentional generative adversarial networks. arXiv preprint arXiv:1711.10485.
Xu, Z., Yang, X., Li, X., Sun, X., & Harbin, P. R. (2018). Strong baseline for single image dehazing with deep features and instance normalization. In BMVC (Vol. 2, p. 5).
Yang, J., Ren, P., Zhang, D., Chen, D., Wen, F., Li, H., & Hua, G. (2017). Neural aggregation network for video face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4362–4371).
Yang, X., Xu, Z., & Luo, J. (2018). Towards perceptual image dehazing by physics-based disentanglement and adversarial training. In Thirty-second AAAI conference on artificial intelligence.
Yi, D., Lei, Z., & Li, S. Z. (2015). Shared representation learning for heterogenous face recognition. In 2015 11th IEEE international conference and workshops on automatic face and gesture recognition (FG) (Vol. 1, pp. 1–7). IEEE.
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, T. S. (2018) Generative image inpainting with contextual attention. arXiv preprint arXiv:1801.07892.
Zeiler, M. D., Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818–833). Springer.
Zhang, H., & Dana, K. (2017). Multi-style generative network for real-time transfer. arXiv preprint arXiv:1703.06953.
Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., & Agrawal, A. (2018). Context encoding for semantic segmentation. In The IEEE conference on computer vision and pattern recognition (CVPR).
Zhang, H., & Patel, V. M. (2018). Densely connected pyramid dehazing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3194–3203).
Zhang, H., Patel, V. M., Riggan, B. S., & Hu, S. (2017a). Generative adversarial network-based synthesis of visible faces from polarimetric thermal faces. In International joint conference on biometrics 2017.
Zhang, H., Sindagi, V., & Patel, V. M. (2017b). Image de-raining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957.
Zhang, Z., Yang, L., & Zheng, Y. (2018). Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. arXiv preprint arXiv:1802.09655.
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE international conference on computer vision (pp. 1–8).
Zhao, J., Mathieu, M., & LeCun, Y. (2016). Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126.
Zhu, Y., Elhoseiny, M., Liu, B., & Elgammal, A. (2017). Imagine it for me: Generative adversarial approach for zero-shot learning from noisy texts. arXiv preprint arXiv:1712.01381.
Acknowledgements
We like to thank Vishwanath A. Sindagi, for his insightful discussion on this topic.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Dr. Rama Chellappa, Dr. Xiaoming Liu, Dr. Tae-Kyun Kim, Dr. Fernando De la Torre and Dr. Chen Change Loy.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by an ARO Grant W911NF-16-1-0126.
Rights and permissions
About this article
Cite this article
Zhang, H., Riggan, B.S., Hu, S. et al. Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks. Int J Comput Vis 127, 845–862 (2019). https://doi.org/10.1007/s11263-019-01175-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-019-01175-3