Cross-depiction problem: Recognition and synthesis of photographs and artwork

Hall, Peter; Cai, Hongping; Wu, Qi; Corradi, Tadeo

doi:10.1007/s41095-015-0017-1

Cross-depiction problem: Recognition and synthesis of photographs and artwork

Research Article
Open access
Published: 18 September 2015

Volume 1, pages 91–103, (2015)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Cross-depiction problem: Recognition and synthesis of photographs and artwork

Download PDF

Peter Hall¹,
Hongping Cai¹,
Qi Wu² &
…
Tadeo Corradi¹

1133 Accesses
27 Citations
Explore all metrics

Abstract

Cross-depiction is the recognition—and synthesis—of objects whether they are photographed, painted, drawn, etc. It is a significant yet underresearched problem. Emulating the remarkable human ability to recognise and depict objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of computer vision. In this paper we motivate the cross-depiction problem, explain why it is difficult, and discuss some current approaches. Our main conclusions are (i) appearance-based recognition systems tend to be over-fitted to one depiction, (ii) models that explicitly encode spatial relations between parts are more robust, and (iii) recognition and non-photorealistic synthesis are related tasks.

Article PDF

Photo Composition Feedback and Enhancement

Artistic Image Analysis Using the Composition of Human Figures

How to Visually Represent Structure

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Csurka, G.; Dance, C. R.; Fan, L.; Willamowski, J.; Bray, C. Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, 1–22, 2004.
Google Scholar
Lazebnik, S.; Schmid, C.; Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2169–2178, 2006.
Google Scholar
Russakovsky, O.; Lin, Y.; Yu, K.; Li, F.-F. Objectcentric spatial pooling for image classification. Lecture Notes in Computer Science 1–15, 2012.
Google Scholar
Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition, 2066–2073, 2012.
Google Scholar
Vedaldi, A.; Fulkerson, B. Vlfeat: An open and portable library of computer vision algorithms. In: Proceedings of the international conference on Multimedia, 1469–1472, 2010.
Google Scholar
Gu, C.; Lim, J. J.; Arbelaez, P.; Malik, J. Recognition using regions. In: IEEE Conference on Computer Vision and Pattern Recognition, 1030–1037, 2009.
Google Scholar
Jia, W.; McKenna, S. J. Classifying textile designs using bags of shapes. In: The 20th International Conference on Pattern Recognition, 294–297, 2010.
Google Scholar
Cootes, T. F.; Edwards, G. J.; Taylor, C. J. Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, No. 6, 681–685, 2001.
Article Google Scholar
Coughlan, J.; Yuille, A.; English, C.; Snow, D. Efficient deformable template detection and localization without user initialization. Computer Vision and Image Understanding Vol. 78, No. 3, 303–319, 2000.
Article Google Scholar
Crandall, D.; Felzenszwalb, P.; Huttenlocher, D. Spatial priors for part-based recognition using statistical models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 10–17, 2005.
Google Scholar
Amit, Y.; Trouvé, A. Pop: Patchwork of parts models for object recognition. International Journal of Computer Vision Vol. 75, No. 2, 267–282, 2007.
Article Google Scholar
Felzenszwalb, P. F.; Girshick, R. B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 9, 1627–1645, 2010.
Article Google Scholar
Felzenszwalb, P. F.; Huttenlocher, D. P. Pictorial structures for object recognition. International Journal of Computer Vision Vol. 61, No. 1, 55–79, 2005.
Article Google Scholar
Fergus, R.; Perona, P.; Zisserman, A. Object class recognition by unsupervised scale-invariant learning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, II-264–II-271, 2003.
Google Scholar
Fischler, M. A.; Elschlager, R. A. The representation and matching of pictorial structures. IEEE Transactions on Computers Vol. C-22, No. 1, 67–92, 1973.
Article Google Scholar
Leibe, B.; Leonardis, A.; Schiele, B. Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision Vol. 77, Nos. 1–3, 259–289, 2008.
Article Google Scholar
Leordeanu, M.; Herbert, M.; Sukthankar, R. Beyond local appearance: Category recognition from pairwise interactions of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2007.
Google Scholar
Elidan, G.; Heitz, G.; Koller, D. Learning object shape: From drawings to images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2064–2071, 2006.
Google Scholar
Ferrari, V.; Fevrier, L.; Jurie, F.; Schmid, C. Groups of adjacent contour segments for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 30, No. 1, 36–51, 2008.
Article Google Scholar
Rom, H.; Medioni, G. Hierarchical decomposition and axial shape description. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 15, No. 10, 973–981, 1993.
Article Google Scholar
Sundar, H.; Silver, D.; Gagvani, N.; Dickinson, S. Skeleton based shape matching and retrieval. In: Proceedings of the Shape Modeling International, 130–139, 2003.
Google Scholar
Siddiqi, K.; Shokoufandeh, A.; Dickinson, S. J.; Zucker, S. W. Shock graphs and shape matching. International Journal of Computer Vision Vol. 35, No. 1, 13–32, 1999.
Article Google Scholar
Pan, S. J.; Tsang, I. W.; Kwok, J. T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks Vol. 22, No. 2, 199–210, 2011.
Article Google Scholar
Gopalan, R.; Li, R.; Chellappa, R. Domain adaptation for object recognition: An unsupervised approach. In: IEEE International Conference on Computer Vision, 999–1006, 2011.
Google Scholar
Fernando, B.; Habrard, A.; Sebban, M.; Tuytelaars, T. Unsupervised visual domain adaptation using subspace alignment. In: IEEE International Conference on Computer Vision, 2960–2967, 2013.
Google Scholar
Crowley, E. J.; Zisserman, A. Of gods and goats: Weakly supervised learning of figurative art. In: British Machine Vision Conference, 2013. Available at http://www.robots.ox.ac.uk/~vgg/publications/2013/Crowley13/crowley13.pdf.
Google Scholar
Hu, R.; Collomosse, J. A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. Computer Vision and Image Understanding Vol. 117, No. 7, 790–806, 2013.
Article Google Scholar
Li, Y.; Song, Y.-Z.; Gong, S. Sketch recognition by ensemble matching of structured features. In: Proceedings of the British Machine Vision Conference, 35.1–35.11, 2013.
Google Scholar
Collomosse, J. P.; McNeill, G.; Qian, Y. Storyboard sketches for content based video retrieval. In: IEEE 12th International Conference on Computer Vision, 245–252, 2009.
Google Scholar
Hu, R.; James, S.; Wang, T.; Collomosse, J. Markov random fields for sketch based video retrieval. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, 279–286, 2013.
Chapter Google Scholar
Shechtman, E.; Irani, M. Matching local selfsimilarities across images and videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2007.
Google Scholar
Crowley, E. J.; Zisserman, A. The state of the art: Object retrieval in paintings using discriminative regions. In: British Machine Vision Conference, 2014. Available at https://www.robots.ox.ac.uk/~vgg/publications/2014/Crowley14/crowley14.pdf.
Google Scholar
Shrivastava, A.; Malisiewicz, T.; Gupta, A.; Efros, A. A. Data-driven visual similarity for cross-domain image matching. ACM Transaction of Graphics Vol. 30, No. 6, Article No. 154, 2011.
Article Google Scholar
Wu, Q.; Hall, P. Modelling visual objects invariant to depictive style. In: Proceedings of the British Machine Vision Conference, 23.1–23.12, 2013.
Google Scholar
Wu, Q.; Hall, P. Prime shapes in natural images. In: BMCV, 1–12, 2012.
Google Scholar
Wu, Q.; Cai, H.; Hall, P. Learning graphs to model visual objects across different depictive styles. Lecture Notes in Computer Science Vol. 8695, 313–328, 2014.
Article Google Scholar
Xiao, B.; Song Y.-Z.; Hall, P. Learning invariant structure for object identification by using graph methods. Computer Vision and Image Understanding Vol. 115, No. 7, 1023–1031, 2011.
Article Google Scholar
Crowley, E. J.; Zisserman, A. The state of the art: Object retrieval in paintings using discriminative regions. In: British Machine Vision Conference, 2014. Available at https://www.robots.ox.ac.uk/~vgg/publications/2014/Crowley14/crowley14.pdf.
Google Scholar
Ginosar, S.; Haas, D.; Brown, T.; Malik, J. Detecting people in cubist art. Lecture Notes in Computer Science Vol. 8925, 101–116, 2015.
Article Google Scholar
BBC. Your paintings dataset. Available at http://www.bbc.co.uk/arts/yourpaintings/.
Everingham, M.; Gool, L. V.; Williams, C. K. I.; Winn, J.; Zisserman, A. The PASCAL visual object classes (voc) challenge. International Journal of Computer Vision Vol. 88, No. 2, 303–338, 2010.
Article Google Scholar
Kyprianidis, J. E.; Collomosse, J.; Wang, T.; Isenberg, T. State of the “art”: A taxonomy of artistic stylization techniques for images and video. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 5, 866–885, 2013.
Article Google Scholar
Lowe, D. G. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision Vol. 60, No. 2, 91–110, 2004.
Article Google Scholar
Berg, A. C.; Malik, J. Geometric blur for template matching. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, I-607–I-614, 2001.
Google Scholar
Chatfield, K.; Philbin, J.; Zisserman, A. Efficient retrieval of deformable shape classes using local selfsimilarities. In: IEEE 12th International Conference on Computer Vision Workshops, 264–271, 2009.
Google Scholar
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1, 886–893, 2005.
Google Scholar
Vedaldi, A.; Zisserman, A. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 3, 480–492, 2012.
Article Google Scholar
Ferrari, V.; Jurie, F.; Schmid, C. From images to shape models for object detection. International Journal of Computer Vision Vol. 87, No. 3, 284–303, 2010.
Article Google Scholar
Perronnin, F.; Sánchez, J.; Mensink, T. Improving the fisher kernel for large-scale image classification. Lecture Notes in Computer Science Vol. 6314, 143–156, 2010.
Article Google Scholar
Hu, R.; Barnard, M.; Collomosse, J. P. Gradient field descriptor for sketch based retrieval and localization. In: The 17th IEEE International Conference on Image Processing, 1025–1028, 2010.
Google Scholar
Gong, B.; Grauman, K.; Sha, F. Connecting the dots with landmarks: Discriminatively learning domaininvariant features for unsupervised domain adaptation. In: Proceedings of the International Conference on Machine Learning, 222–230, 2013.
Google Scholar
Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. Lecture Notes in Computer Science Vol. 6314, 213–226, 2010.
Article Google Scholar
Song, Y.-Z.; Arbelaez, P.; Hall, P.; Li, C.; Balikai, A. Finding semantic structures in image hierarchies using Laplacian graph energy. Lecture Notes in Computer Science Vol. 6314, 694–707, 2010.
Article Google Scholar
Wu, Q.; Hall, P. Prime shapes in natural images. In: Proceedings of the British Machine Vision Conference, 45.1–45.12, 2012.
Google Scholar
Felzenszwalb, P. F.; Girshick, R. B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 9, 1627–1645, 2010.
Article Google Scholar
Cho, M.; Alahari, K.; Ponce, J. Learning graphs to match. In: Proceedings of the IEEE International Conference on Computer Vision, 25–32, 2013.
Google Scholar
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, 1097–1105, 2012.
Google Scholar
Song, Y.-Z.; Pickup, D.; Li, C.; Rosin, P.; Hall, P. Abstract art by shape classification. IEEE Transactions on Visualization and Computer Graphics Vol. 19, No. 8, 1252–1263, 2013.
Article Google Scholar
Hall, P.; Song, Y.-Z. Simple art as abstractions of photographs. In: Proceedings of the Symposium on Computational Aesthetics, 77–85, 2013.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bath, Bath, UK
Peter Hall, Hongping Cai & Tadeo Corradi
School of Computer Science, University of Adelaide, Adelaide, Australia
Qi Wu

Authors

Peter Hall
View author publications
You can also search for this author in PubMed Google Scholar
Hongping Cai
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tadeo Corradi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Hall.

Additional information

This article is published with open access at Springerlink.com

Peter Hall is a professor of visual computing at University of Bath and also a director of the Media Technology Research Centre at University of Bath. His research interests cover both computer vision and computer graphics: the relationship between photographs and artwork of all kinds, and 3D model acquisition of complex phenomena from video is of particular interest.

Hongping Cai is a postdoctoral researcher in Media Technology Research Centre, University of Bath, UK. She received her Ph.D. degree from National University of Defense Technology, China, in 2010. During her Ph.D., she spent about two years as a visiting student in Centre for Vision, Speech and Signal Processing, University of Surrey, UK. She joint the Centre for Machine Perception, Czech Technical University, Prague, as a post doctor in 2012. Her research interests include cross-depiction classification and detection, texture-less object detection, visual codebook learning, discriminant descriptor learning, and so on.

Qi Wu is currently a postdoctoral researcher in Australia Centre for Visual Technologies, University of Adelaide. He received his Ph.D. degree from University of Bath, UK, in 2015. His research interests include cross-depiction object detection and classification, attributes learning, neural networks, image captioning, and so on.

Tadeo Corradi is a Ph.D. student at the Mechanical Engineering Department of University of Bath, UK. His research interests include visual and tactile robotics, multi-modal object classification, machine learning, and visuo-tactile integration.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Hall, P., Cai, H., Wu, Q. et al. Cross-depiction problem: Recognition and synthesis of photographs and artwork. Comp. Visual Media 1, 91–103 (2015). https://doi.org/10.1007/s41095-015-0017-1

Download citation

Received: 25 March 2015
Accepted: 20 May 2015
Published: 18 September 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s41095-015-0017-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Cross-depiction problem: Recognition and synthesis of photographs and artwork

Abstract

Article PDF

Similar content being viewed by others

Photo Composition Feedback and Enhancement

Artistic Image Analysis Using the Composition of Human Figures

How to Visually Represent Structure

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-depiction problem: Recognition and synthesis of photographs and artwork

Abstract

Article PDF

Similar content being viewed by others

Photo Composition Feedback and Enhancement

Artistic Image Analysis Using the Composition of Human Figures

How to Visually Represent Structure

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation