Abstract
Cross-depiction is the recognition—and synthesis—of objects whether they are photographed, painted, drawn, etc. It is a significant yet underresearched problem. Emulating the remarkable human ability to recognise and depict objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of computer vision. In this paper we motivate the cross-depiction problem, explain why it is difficult, and discuss some current approaches. Our main conclusions are (i) appearance-based recognition systems tend to be over-fitted to one depiction, (ii) models that explicitly encode spatial relations between parts are more robust, and (iii) recognition and non-photorealistic synthesis are related tasks.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Csurka, G.; Dance, C. R.; Fan, L.; Willamowski, J.; Bray, C. Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, 1–22, 2004.
Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2169–2178, 2006.
Russakovsky, O.; Lin, Y.; Yu, K.; Li, F.-F. Objectcentric spatial pooling for image classification. Lecture Notes in Computer Science 1–15, 2012.
Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, 2066–2073, 2012.
Vedaldi, A.; Fulkerson, B. Vlfeat: An open and portable library of computer vision algorithms. In: Proceedings of the international conference on Multimedia, 1469–1472, 2010.
Gu, C.; Lim, J. J.; Arbelaez, P.; Malik, J. Recognition using regions. In: IEEE Conference on Computer Vision and Pattern Recognition, 1030–1037, 2009.
Jia, W.; McKenna, S. J. Classifying textile designs using bags of shapes. In: The 20th International Conference on Pattern Recognition, 294–297, 2010.
Cootes, T. F.; Edwards, G. J.; Taylor, C. J. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 6, 681–685, 2001.
Coughlan, J.; Yuille, A.; English, C.; Snow, D. Efficient deformable template detection and localization without user initialization. Computer Vision and Image Understanding Vol. 78, No. 3, 303–319, 2000.
Crandall, D.; Felzenszwalb, P.; Huttenlocher, D. Spatial priors for part-based recognition using statistical models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 10–17, 2005.
Amit, Y.; Trouvé, A. Pop: Patchwork of parts models for object recognition. International Journal of Computer Vision Vol. 75, No. 2, 267–282, 2007.
Felzenszwalb, P. F.; Girshick, R. B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 9, 1627–1645, 2010.
Felzenszwalb, P. F.; Huttenlocher, D. P. Pictorial structures for object recognition. International Journal of Computer Vision Vol. 61, No. 1, 55–79, 2005.
Fergus, R.; Perona, P.; Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, II-264–II-271, 2003.
Fischler, M. A.; Elschlager, R. A. The representation and matching of pictorial structures. IEEE Transactions on Computers Vol. C-22, No. 1, 67–92, 1973.
Leibe, B.; Leonardis, A.; Schiele, B. Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision Vol. 77, Nos. 1–3, 259–289, 2008.
Leordeanu, M.; Herbert, M.; Sukthankar, R. Beyond local appearance: Category recognition from pairwise interactions of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2007.
Elidan, G.; Heitz, G.; Koller, D. Learning object shape: From drawings to images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2064–2071, 2006.
Ferrari, V.; Fevrier, L.; Jurie, F.; Schmid, C. Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 1, 36–51, 2008.
Rom, H.; Medioni, G. Hierarchical decomposition and axial shape description. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 15, No. 10, 973–981, 1993.
Sundar, H.; Silver, D.; Gagvani, N.; Dickinson, S. Skeleton based shape matching and retrieval. In: Proceedings of the Shape Modeling International, 130–139, 2003.
Siddiqi, K.; Shokoufandeh, A.; Dickinson, S. J.; Zucker, S. W. Shock graphs and shape matching. International Journal of Computer Vision Vol. 35, No. 1, 13–32, 1999.
Pan, S. J.; Tsang, I. W.; Kwok, J. T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks Vol. 22, No. 2, 199–210, 2011.
Gopalan, R.; Li, R.; Chellappa, R. Domain adaptation for object recognition: An unsupervised approach. In: IEEE International Conference on Computer Vision, 999–1006, 2011.
Fernando, B.; Habrard, A.; Sebban, M.; Tuytelaars, T. Unsupervised visual domain adaptation using subspace alignment. In: IEEE International Conference on Computer Vision, 2960–2967, 2013.
Crowley, E. J.; Zisserman, A. Of gods and goats: Weakly supervised learning of figurative art. In: British Machine Vision Conference, 2013. Available at http://www.robots.ox.ac.uk/~vgg/publications/2013/Crowley13/crowley13.pdf.
Hu, R.; Collomosse, J. A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. Computer Vision and Image Understanding Vol. 117, No. 7, 790–806, 2013.
Li, Y.; Song, Y.-Z.; Gong, S. Sketch recognition by ensemble matching of structured features. In: Proceedings of the British Machine Vision Conference, 35.1–35.11, 2013.
Collomosse, J. P.; McNeill, G.; Qian, Y. Storyboard sketches for content based video retrieval. In: IEEE 12th International Conference on Computer Vision, 245–252, 2009.
Hu, R.; James, S.; Wang, T.; Collomosse, J. Markov random fields for sketch based video retrieval. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, 279–286, 2013.
Shechtman, E.; Irani, M. Matching local selfsimilarities across images and videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2007.
Crowley, E. J.; Zisserman, A. The state of the art: Object retrieval in paintings using discriminative regions. In: British Machine Vision Conference, 2014. Available at https://www.robots.ox.ac.uk/~vgg/publications/2014/Crowley14/crowley14.pdf.
Shrivastava, A.; Malisiewicz, T.; Gupta, A.; Efros, A. A. Data-driven visual similarity for cross-domain image matching. ACM Transaction of Graphics Vol. 30, No. 6, Article No. 154, 2011.
Wu, Q.; Hall, P. Modelling visual objects invariant to depictive style. In: Proceedings of the British Machine Vision Conference, 23.1–23.12, 2013.
Wu, Q.; Hall, P. Prime shapes in natural images. In: BMCV, 1–12, 2012.
Wu, Q.; Cai, H.; Hall, P. Learning graphs to model visual objects across different depictive styles. Lecture Notes in Computer Science Vol. 8695, 313–328, 2014.
Xiao, B.; Song Y.-Z.; Hall, P. Learning invariant structure for object identification by using graph methods. Computer Vision and Image Understanding Vol. 115, No. 7, 1023–1031, 2011.
Crowley, E. J.; Zisserman, A. The state of the art: Object retrieval in paintings using discriminative regions. In: British Machine Vision Conference, 2014. Available at https://www.robots.ox.ac.uk/~vgg/publications/2014/Crowley14/crowley14.pdf.
Ginosar, S.; Haas, D.; Brown, T.; Malik, J. Detecting people in cubist art. Lecture Notes in Computer Science Vol. 8925, 101–116, 2015.
BBC. Your paintings dataset. Available at http://www.bbc.co.uk/arts/yourpaintings/.
Everingham, M.; Gool, L. V.; Williams, C. K. I.; Winn, J.; Zisserman, A. The PASCAL visual object classes (voc) challenge. International Journal of Computer Vision Vol. 88, No. 2, 303–338, 2010.
Kyprianidis, J. E.; Collomosse, J.; Wang, T.; Isenberg, T. State of the “art”: A taxonomy of artistic stylization techniques for images and video. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 5, 866–885, 2013.
Lowe, D. G. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Berg, A. C.; Malik, J. Geometric blur for template matching. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, I-607–I-614, 2001.
Chatfield, K.; Philbin, J.; Zisserman, A. Efficient retrieval of deformable shape classes using local selfsimilarities. In: IEEE 12th International Conference on Computer Vision Workshops, 264–271, 2009.
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 886–893, 2005.
Vedaldi, A.; Zisserman, A. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 3, 480–492, 2012.
Ferrari, V.; Jurie, F.; Schmid, C. From images to shape models for object detection. International Journal of Computer Vision Vol. 87, No. 3, 284–303, 2010.
Perronnin, F.; Sánchez, J.; Mensink, T. Improving the fisher kernel for large-scale image classification. Lecture Notes in Computer Science Vol. 6314, 143–156, 2010.
Hu, R.; Barnard, M.; Collomosse, J. P. Gradient field descriptor for sketch based retrieval and localization. In: The 17th IEEE International Conference on Image Processing, 1025–1028, 2010.
Gong, B.; Grauman, K.; Sha, F. Connecting the dots with landmarks: Discriminatively learning domaininvariant features for unsupervised domain adaptation. In: Proceedings of the International Conference on Machine Learning, 222–230, 2013.
Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. Lecture Notes in Computer Science Vol. 6314, 213–226, 2010.
Song, Y.-Z.; Arbelaez, P.; Hall, P.; Li, C.; Balikai, A. Finding semantic structures in image hierarchies using Laplacian graph energy. Lecture Notes in Computer Science Vol. 6314, 694–707, 2010.
Wu, Q.; Hall, P. Prime shapes in natural images. In: Proceedings of the British Machine Vision Conference, 45.1–45.12, 2012.
Felzenszwalb, P. F.; Girshick, R. B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 9, 1627–1645, 2010.
Cho, M.; Alahari, K.; Ponce, J. Learning graphs to match. In: Proceedings of the IEEE International Conference on Computer Vision, 25–32, 2013.
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, 1097–1105, 2012.
Song, Y.-Z.; Pickup, D.; Li, C.; Rosin, P.; Hall, P. Abstract art by shape classification. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 8, 1252–1263, 2013.
Hall, P.; Song, Y.-Z. Simple art as abstractions of photographs. In: Proceedings of the Symposium on Computational Aesthetics, 77–85, 2013.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is published with open access at Springerlink.com
Peter Hall is a professor of visual computing at University of Bath and also a director of the Media Technology Research Centre at University of Bath. His research interests cover both computer vision and computer graphics: the relationship between photographs and artwork of all kinds, and 3D model acquisition of complex phenomena from video is of particular interest.
Hongping Cai is a postdoctoral researcher in Media Technology Research Centre, University of Bath, UK. She received her Ph.D. degree from National University of Defense Technology, China, in 2010. During her Ph.D., she spent about two years as a visiting student in Centre for Vision, Speech and Signal Processing, University of Surrey, UK. She joint the Centre for Machine Perception, Czech Technical University, Prague, as a post doctor in 2012. Her research interests include cross-depiction classification and detection, texture-less object detection, visual codebook learning, discriminant descriptor learning, and so on.
Qi Wu is currently a postdoctoral researcher in Australia Centre for Visual Technologies, University of Adelaide. He received his Ph.D. degree from University of Bath, UK, in 2015. His research interests include cross-depiction object detection and classification, attributes learning, neural networks, image captioning, and so on.
Tadeo Corradi is a Ph.D. student at the Mechanical Engineering Department of University of Bath, UK. His research interests include visual and tactile robotics, multi-modal object classification, machine learning, and visuo-tactile integration.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Hall, P., Cai, H., Wu, Q. et al. Cross-depiction problem: Recognition and synthesis of photographs and artwork. Comp. Visual Media 1, 91–103 (2015). https://doi.org/10.1007/s41095-015-0017-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-015-0017-1