Sketch-a-Net: A Deep Neural Network that Beats Humans


We propose a deep learning approach to free-hand sketch recognition that achieves state-of-the-art performance, significantly surpassing that of humans. Our superior performance is a result of modelling and exploiting the unique characteristics of free-hand sketches, i.e., consisting of an ordered set of strokes but lacking visual cues such as colour and texture, being highly iconic and abstract, and exhibiting extremely large appearance variations due to different levels of abstraction and deformation. Specifically, our deep neural network, termed Sketch-a-Net has the following novel components: (i) we propose a network architecture designed for sketch rather than natural photo statistics. (ii) Two novel data augmentation strategies are developed which exploit the unique sketch-domain properties to modify and synthesise sketch training data at multiple abstraction levels. Based on this idea we are able to both significantly increase the volume and diversity of sketches for training, and address the challenge of varying levels of sketching detail commonplace in free-hand sketches. (iii) We explore different network ensemble fusion strategies, including a re-purposed joint Bayesian scheme, to further improve recognition performance. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photos or sketches. Furthermore, through visualising the learned filters, we offer useful insights in to where the superior performance of our network comes from.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    We set \(k=30\) in this work and the regularisation parameter of JB is set to 1. For robustness at test time, we also take 10 crops and reflections of each train and test image (Krizhevsky et al. 2012). This inflates the KNN train and test pool by 10, and the crop-level matches are combined to image predictions by majority voting.


  1. Chatfield, K., Simonyan , K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In BMVC.

  2. Chen, D., Cao, X., Wang, L., Wen, F., & Sun, J. (2012). Bayesian face revisited: A joint formulation. In ECCV.

  3. Deng, J., Dong, W., Socher, R., Li, L., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.

  4. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2015). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML.

  5. Eitz, M., Hays, J., & Alexa, M. (2012). How do humans sketch objects? In SIGGRAPH.

  6. Eitz, M., Hildebrand, K., Boubekeur, T., & Alexa, M. (2011). Sketch-based image retrieval: Benchmark and bag-of-features descriptors. TVCG, 17(11), 1624–1636.

    Google Scholar 

  7. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.

    Article  MATH  Google Scholar 

  8. Gabor, D. (1946). Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers, Part III: Radio and Communication Engineering, 93, 429–441.

    Google Scholar 

  9. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.

    MathSciNet  Article  MATH  Google Scholar 

  10. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

  11. Hu, R., & Collomosse, J. (2013). A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. CVIU, 117(7), 790–806.

    Google Scholar 

  12. Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology, 148, 574–591.

    Article  Google Scholar 

  13. Jabal, M. F. A., Rahim, M. S. M., Othman, N. Z. S., & Jupri, Z. (2009). A comparative study on extraction and recognition method of CAD data from CAD drawings. In International conference on information management and engineering.

  14. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.

  15. Johnson, G., Gross, M. D., Hong, J., & Do, E. Y.-L. (2009). Computational support for sketching in design: A review. Foundations and Trends in Human–Computer Interaction, 2, 1–93.

    Article  Google Scholar 

  16. Klare, B. F., Li, Z., & Jain, A. K. (2011). Matching forensic sketches to mug shot photos. TPAMI, 33(3), 639–646.

  17. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

  18. Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In NIPS.

  19. LeCun, Y., Bottou, L., Orr, G. B., & Müller, K. (1998). Efficient backprop. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Springer.

  20. Li, Y., Hospedales, T. M., Song, Y., & Gong, S. (2015). Free-hand sketch recognition by multi-kernel feature learning. Springer. CVIU, 137, 1–11.

  21. Li, Y., Song, Y., & Gong, S. (2013). Sketch recognition by ensemble matching of structured features. In BMVC.

  22. Lu, T., Tai, C., Su, F., & Cai, S. (2005). A new recognition model for electronic architectural drawings. Computer-Aided Design, 37(10), 1053–1069.

    Article  Google Scholar 

  23. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.

    Article  Google Scholar 

  24. Ouyang, S., Hospedales ,T., Song, Y., & Li, X. (2014). Cross-modal face matching: Beyond viewed sketches. In ACCV.

  25. Schaefer, S., McPhail, T., & Warren, J. (2006). Image deformation using moving least squares. TOG, 25(3), 533–540.

    Article  Google Scholar 

  26. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.

    Article  Google Scholar 

  27. Schneider, R. G., & Tuytelaars, T. (2014). Sketch classification and classification-driven analysis using Fisher vectors. In SIGGRAPH Asia.

  28. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.

  29. Sousa, P., & Fonseca, M. J. (2009). Geometric matching for clip-art drawing retrieval. Journal of Visual Communication and Image Representation, 20(12), 71–83.

    Article  Google Scholar 

  30. Stollenga, M. F., Masci, J., Gomez, F., & Schmidhuber, J. (2014). Deep networks with internal selective attention through feedback connections. In NIPS.

  31. Wang, F., Kang, L., & Li, Y. (2015). Sketch-based 3D shape retrieval using convolutional neural networks. In CVPR.

  32. Yanık, E., & Sezgin, T. M. (2015). Active learning for sketch recognition. Computers and Graphics, 52, 93–105.

    Article  Google Scholar 

  33. Yin, F., Wang, Q., Zhang, X., & Liu, C. (2013). ICDAR 2013 Chinese handwriting recognition competition. In International conference on document analysis and recognition.

  34. Yu, Q., Yang, Y., Song, Y. Z., Xiang, T., & Hospedales, T. M. (2015). Sketch-a-net that beats humans. In BMVC.

  35. Zeiler, M., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In ECCV.

  36. Zitnick, C. L., & Dollár, P. (2014). Edge boxes: Locating object proposals from edges. In ECCV.

  37. Zitnick, C. L., & Parikh, D. (2013). Bringing semantics into focus using visual abstraction. In CVPR.

  38. Zou, C., Huang, Z., Lau, R. W., Liu, J., & Fu, H. (2015). Sketch-based shape retrieval using pyramid-of-parts. arXiv preprint arXiv:1502.04232.

Download references


This Project received support from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement #640891, and the Royal Society and Natural Science Foundation of China (NSFC) Joint Grant #IE141387 and #61511130081. We gratefully acknowledge the support of NVIDIA Corporation for the donation of the GPUs used for this research.

Author information



Corresponding author

Correspondence to Qian Yu.

Additional information

Communicated by Xianghua Xie, Mark Jones, Gary Tam.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yu, Q., Yang, Y., Liu, F. et al. Sketch-a-Net: A Deep Neural Network that Beats Humans. Int J Comput Vis 122, 411–425 (2017).

Download citation


  • Sketch recognition
  • Convolutional neural network
  • Data augmentation
  • Stroke ordering
  • Sketch abstraction