Sketch recognition using transfer learning

  • Mustafa SertEmail author
  • Emel Boyacı


Humans have an excellent ability to recognize freehand sketch drawings despite their abstract and sparse structures. Understanding freehand sketches with automated methods is a challenging task due to the diversity and abstract structures of these sketches. In this paper, we propose an efficient freehand sketch recognition scheme, which is based on the feature-level fusion of Convolutional Neural Networks (CNNs) in the transfer learning context. Specifically, we analyse different layer performances of distinct ImageNet pretrained CNNs and combine best performing layer features within the CNN-SVM pipeline for recognition. We also employ Principal Component Analysis (PCA) to reduce the fused deep feature dimensions to ensure the efficiency of the recognition application on the limited-capacity devices. We perform evaluations on two real sketch benchmark datasets, namely the Sketchy and the TU-Berlin to show the effectiveness of the proposed scheme. Our experimental results show that, the feature-level fusion scheme with the PCA achieves a recognition accuracy of 97.91% and 72.5% on the Sketchy and TU-Berlin datasets, respectively. This result is promising when compared with the human recognition accuracy of 73.1% on the TU-Berlin dataset. We also develop a sketch recognition application for smart devices to demonstrate the proposed scheme.


Sketch recognition Transfer learning Convolutional neural networks (CNNs) Feature fusion 



The authors thank Berkay Selbes for running the feature extraction time experiments.


  1. 1.
    Angelova A, Krizhevsky A, Vanhoucke V, Ogale A, Ferguson D (2015) Real-time pedestrian detection with deep network cascadesGoogle Scholar
  2. 2.
    Aihkisalo T, Paaso T (2012) Latencies of service invocation and processing of the REST and SOAP Web service interfaces. In: 2012 IEEE 8th world congress on services. Honolulu, pp 100–107Google Scholar
  3. 3.
    Arjovsky M, Chintala S, Bottou L (2017) Wasserstein GAN. arXiv:1701.07875
  4. 4.
    Boyaci E, Sert M (2017) Feature-level fusion of deep convolutional neural networks for sketch recognition on smartphones. In: Proceedings of IEEE international conference on consumer electronics (ICCE2017), January 8-10, 2017, Las Vegas, Nevada, USA, pp 485–486Google Scholar
  5. 5.
    Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27CrossRefGoogle Scholar
  6. 6.
    Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of British machine vision conference (BMVC)Google Scholar
  7. 7.
    Chen W, Hays J (2018) SketchyGAN: towards diverse and realistic sketch to image synthesis. arXiv:1801.02753
  8. 8.
    Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th international conference on neural information processing systems (NIPS’16). Curran Associates Inc., pp 2180–2188Google Scholar
  9. 9.
    Creswell A, Bharath AA (2016) Adversarial training for sketch retrieval. In: Computer vision - ECCV 2016 workshops, lecture notes in computer science, vol 9913. Springer, Cham, pp 798–809Google Scholar
  10. 10.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE Comput soc conf comput vis pattern recognit (CVPR), pp 886–893Google Scholar
  11. 11.
    Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. IEEE Computer Vision and Pattern Recognition (CVPR)Google Scholar
  12. 12.
    Denton EL, Chintala S, Fergus T et al (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPSGoogle Scholar
  13. 13.
    Eitz M, Hildebrand K, Boubekeur T, Alexa M (2011) Sketch-based image retrieval: benchmark and bag-of-features descriptors. IEEE Trans Visual Comput Graph 17(11):1624–1636CrossRefGoogle Scholar
  14. 14.
    Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Graph 31(4):1–10Google Scholar
  15. 15.
    Ergun H, Akyuz YC, Sert M, Liu J (2016) Early and late level fusion of deep convolutional neural networks for visual concept recognition. Int J Semant Comput 10 (03):379–397CrossRefGoogle Scholar
  16. 16.
    Ergun H, Sert M (2016) Fusing deep convolutional networks for large scale visual concept classification. In: IEEE international conference on multimedia big data (BigMM2016)Google Scholar
  17. 17.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S (2014) Generative adversarial nets. In: Advances in neural information processing systems 27. Curran Associates, Inc., pp 2672–2680Google Scholar
  18. 18.
    Guo J, Gould S (2015) Deep CNN ensemble with data augmentation for object detection. arXiv:1506.07224
  19. 19.
    Guo J, Wang C, Roman-Rangel E, Chao H, Rui Y (2016) Building hierarchical representations for oracle character and sketch recognition. IEEE Transactions on Image Processing (TIP)Google Scholar
  20. 20.
    Isola P, Zhu J, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, pp 5967–5976Google Scholar
  21. 21.
    Jahani-Fariman H, Kavakli M, Boyali A (2018) MATRACK: block sparse Bayesian learning for a sketch recognition approach. Multimed Tools Appl 77 (2):1997–2012CrossRefGoogle Scholar
  22. 22.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678Google Scholar
  23. 23.
    Jolliffe L (1986) Principal component analysis. Springer, New YorkCrossRefGoogle Scholar
  24. 24.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition, pp 1725–1732Google Scholar
  25. 25.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 1097–1105Google Scholar
  26. 26.
    LeCun YA, Bottou L, Müller K R, Orr GB (2012) Efficient BackProp. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Lecture notes in computer science, vol 7700, pp 9–48Google Scholar
  27. 27.
    Li Y, Hospedales TM, Song YZ, Gong S (2015) Free-hand sketch recognition by multi-kernel feature learning. Comput Vis Image Underst 137(C):1–11Google Scholar
  28. 28.
    Li Y, Song Y, Gong S (2017) Sketch recognition by ensemble matching of structured features. In: BMVCGoogle Scholar
  29. 29.
    Liu K, Sun Z, Song M, et al. (2017) Iterative samples labeling for sketch recognition. Multimed Tools Appl 76(10):12819–12852CrossRefGoogle Scholar
  30. 30.
    Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
  31. 31.
    Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of features image classification. In: Computer vision - ECCV. Springer, New York, pp 490–503Google Scholar
  32. 32.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefGoogle Scholar
  33. 33.
    Platt JC (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Adv. Large margin classifiers. MIT Press, pp 61–74Google Scholar
  34. 34.
    Qian Y, Yongxin Y, Yi-Zhe S, Xiang T, Hospedales TM (2015) Sketch-a-net that beats humans. In: Proceedings of the British machine vision conference 2015, (BMVC), pp 1–12Google Scholar
  35. 35.
    Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLRGoogle Scholar
  36. 36.
    Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition workshops (CVPRW ’14). IEEE Computer Society, Washington, DC, pp 512–519Google Scholar
  37. 37.
    Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. In: NIPSGoogle Scholar
  38. 38.
    Sangkloy P, Burnell N, Ham C, Hays J (2016) The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans Graph 35(4):119:1–119:12CrossRefGoogle Scholar
  39. 39.
    Sarvadevabhatla RK, Babu RV (2015) Freehand sketch recognition using deep features. arXiv:
  40. 40.
    Schneider RG, Tuytelaars T (2014) Sketch classification and classification-driven analysis using fisher vectors. ACM Trans Graph 33(6):1–9CrossRefGoogle Scholar
  41. 41.
    Seddati O, Dupont S, Mahmoudi S (2017) DeepSketch 3 analyzing deep neural networks features for better sketch recognition and sketch-based image retrieval. Multimed Tools Appl 76(21):22333–22359CrossRefGoogle Scholar
  42. 42.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:
  43. 43.
    Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 399–402Google Scholar
  44. 44.
    Srinivas S, Ravi Sarvadevabhatla K, Mopuri KR, Prabhu N, Kruthiventi S, Babu RV (2016) A taxonomy of deep convolutional neural nets for computer vision. Front Robot AI, 2(36)Google Scholar
  45. 45.
    Szegedy C, Liu W, Yangqing J, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9Google Scholar
  46. 46.
    Tseng KY, Lin YL, Chen YH, Hsu WH (2012) Sketch-based image retrieval on mobile devices using compact hash bits. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 913–916Google Scholar
  47. 47.
    Wagh K, Thool R (2012) A comparative study of SOAP vs REST web services provisioning techniques for mobile host. J Inf Eng Appl 2(5):12–16. ISSN 2224-5782 (print), ISSN 2225-0506 (online)Google Scholar
  48. 48.
    Wang L, Sindagi V, Patel V (2018) High-quality facial photo-sketch synthesis using multi-adversarial networks. In: 13th IEEE international conference on automatic face & gesture recognition (FG 2018). Xi’an, pp 83–90Google Scholar
  49. 49.
    Wu S, Yang H, Zheng S, et al. (2017) Motion sketch based crowd video retrieval. Multimed Tools Appl 76(19):20167–20195CrossRefGoogle Scholar
  50. 50.
    Xiao C, Wang C, Zhang L (2015) PPTLens: create digital objects with sketch images. ACM Conference on MultimediaGoogle Scholar
  51. 51.
    Yi Z, Zhang H, Tan P, Gong M (2017) DualGAN: unsupervised dual learning for image-to-image translation. In: 2017 IEEE international conference on computer vision (ICCV). Venice, pp 2868–2876Google Scholar
  52. 52.
    Yoo D, Park S, Lee J-Y, Kweon IS (2014) Fisher kernel for deep neural activations. arXiv:
  53. 53.
    Zhou T, Krähenbühl P, Aubry M, Huang Q, Efros AA (2016) Learning dense correspondence via 3D-guided cycle consistency. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, pp 117–126Google Scholar
  54. 54.
    Zhu J-Y, Krähenbühl P, Shechtman E, Efros AA (2016) Generative visual manipulation on the natural image manifold. In: ECCVGoogle Scholar
  55. 55.
    Zhu J-Y, Park T, Isola P, Efros A A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 2242–2251Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringBaşkent UniversityAnkaraTurkey

Personalised recommendations