Cross-Modal Face Matching: Beyond Viewed Sketches

  • Shuxin OuyangEmail author
  • Timothy Hospedales
  • Yi-Zhe Song
  • Xueming Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9004)


Matching face images across different modalities is a challenging open problem for various reasons, notably feature heterogeneity, and particularly in the case of sketch recognition – abstraction, exaggeration and distortion. Existing studies have attempted to address this task by engineering invariant features, or learning a common subspace between the modalities. In this paper, we take a different approach and explore learning a mid-level representation within each domain that allows faces in each modality to be compared in a domain invariant way. In particular, we investigate sketch-photo face matching and go beyond the well-studied viewed sketches to tackle forensic sketches and caricatures where representations are often symbolic. We approach this by learning a facial attribute model independently in each domain that represents faces in terms of semantic properties. This representation is thus more invariant to heterogeneity, distortions and robust to mis-alignment. Our intermediate level attribute representation is then integrated synergistically with the original low-level features using CCA. Our framework shows impressive results on cross-modal matching tasks using forensic sketches, and even more challenging caricature sketches. Furthermore, we create a new dataset with \(\approx \)59, 000 attribute annotations for evaluation and to facilitate future research.


Partial Little Square Face Recognition Local Binary Pattern Canonical Correlation Analysis Attribute Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Klare, B., Li, Z., Jain, A.: Matching forensic sketches to mug shot photos. In: TPAMI, pp. 639–646 (2011)Google Scholar
  2. 2.
    Khan, Z., Hu, Y., Mian, A.: Facial self similarity for sketch to photo matching. In: Digital Image Computing Techniques and Applications (DICTA), pp. 1–7 (2012)Google Scholar
  3. 3.
    Kiani Galoogahi, H., Sim, T.: Face photo retrieval by sketch example. In: the 20th ACM International Conference on Multimedia, pp. 949–952 (2012)Google Scholar
  4. 4.
    Galoogahi, H., Sim, T.: Face sketch recognition by local radon binary pattern lrbp. In: ICIP, pp. 1837–1840 (2012)Google Scholar
  5. 5.
    Pramanik, S., Bhattacharjee, D.: Geometric feature based face-sketch recognition. In: Pattern Recognition, Informatics and Medical Engineering (PRIME), pp. 409–415 (2012)Google Scholar
  6. 6.
    Bhatt, H.S., Bharadwaj, S., Singh, R., Vatsa, M.: On matching sketches with digital face images. In: Biometrics: Theory Applications and Systems, pp. 1–7 (2010)Google Scholar
  7. 7.
    Choi, J., Sharma, A., Jacobs, D., Davis, L.: Data insufficiency in sketch versus photo face recognition. In: CVPR, pp. 1–8 (2012)Google Scholar
  8. 8.
    Klare, B., Bucak, S., Jain, A., Akgul, T.: Towards automated caricature recognition. In: The 5th IAPR International Conference on Biometrics Compendium, pp. 139–146 (2012)Google Scholar
  9. 9.
    Bhatt, H.S., Bharadwaj, S., Singh, R., Vatsa, M.: Memetic approach for matching sketches with digital face images. Indraprastha Institute of Information Technology Delhi, pp. 1–8 (2012)Google Scholar
  10. 10.
    Tang, X., Wang, X.: Face photo recognition using sketch. In: ICIP, pp. 257–260 (2002)Google Scholar
  11. 11.
    Wang, X., Tang, X.: Face photo-sketch synthesis and recognition. TPAMI 31, 1955–1967 (2009)CrossRefGoogle Scholar
  12. 12.
    Sharma, A., Jacobs, D.W.: Bypassing synthesis PLS for face recognition with pose, low-resolution and sketch. In: CVPR, pp. 593–600 (2011)Google Scholar
  13. 13.
    Bhatt, H., Bharadwaj, S., Singh, R., Vatsa, M.: Memetically optimized mcwld for matching sketches with digital face images. IEEE Trans. Inf. Forensics Secur. 7, 1522–1535 (2012)CrossRefGoogle Scholar
  14. 14.
    Galoogahi, H., Sim, T.: Inter-modality face sketch recognition. In: ICME (2012)Google Scholar
  15. 15.
    Uhl, R.G., Jr., da Vitoria Lobo, N.: A framework for recognizing a facial image from a police sketch. In: CVPR, pp. 586–593 (1996)Google Scholar
  16. 16.
    Bonnen, K., Klare, B., Jain, A.: Component-based representation in automated face recognition. IEEE Trans. Inf. Forensics Secur. 8, 239–253 (2013)CrossRefGoogle Scholar
  17. 17.
    Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on Multimedia, pp. 251–260 (2010)Google Scholar
  18. 18.
    Gong, Y., Ke, Q., Isard, M., Lazebnik, S.: A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV 106, 210–233 (2014)CrossRefGoogle Scholar
  19. 19.
    Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: ICCV, pp. 2088–2095 (2013)Google Scholar
  20. 20.
    Huang, D.A., Wang, Y.C.F.: Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In: ICCV (2013)Google Scholar
  21. 21.
    Layne, R., Hospedales, T.M., Gong, S.: Person re-identification by attributes. In: BMVC, pp. 1–11 (2012)Google Scholar
  22. 22.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp. 951–958 (2009)Google Scholar
  23. 23.
    Fu, Y., Hospedales, T.M., Xiang, T., Fu, Z., Gong, S.: Transductive multi-view embedding for zero-shot recognition and annotation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part II. LNCS, vol. 8690, pp. 584–599. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  24. 24.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR, pp. 3337–3344 (2011)Google Scholar
  25. 25.
    Fu, Y., Hospedales, T., Xiang, T., Gong, S.: Learning multimodal latent attributes. TPAMI 36, 303–316 (2014)CrossRefGoogle Scholar
  26. 26.
    Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: ICCV, pp. 2864–2871 (2013)Google Scholar
  27. 27.
    Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: ICCV, pp. 365–372 (2009)Google Scholar
  28. 28.
    Johnson, K.E.: Effects of knowledge and development on subordinate level categorization. Cognitive Dev. 13, 515–545 (1998)CrossRefGoogle Scholar
  29. 29.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: CVPR, pp. 1577–1584 (2011)Google Scholar
  30. 30.
    Yi, D., Liu, R., Chu, R., Lei, Z., Li, S.Z.: Face matching between near infrared and visible light images. In: Lee, S.-W., Li, S.Z. (eds.) Advances in Biometrics. LNCS, vol. 4642. Springer, Heidelberg (2007) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Shuxin Ouyang
    • 1
    Email author
  • Timothy Hospedales
    • 2
  • Yi-Zhe Song
    • 2
  • Xueming Li
    • 1
  1. 1.Beijing University of Posts and TelecommunicationsBeijingChina
  2. 2.School of Electronic Engineering and Computer ScienceQueen Mary University of LondonLondonUK

Personalised recommendations