Abstract
Deep learning methods for image classification and object detection are overviewed. In particular we consider such deep models as autoencoders, restricted Boltzmann machines and convolutional neural networks. Existing software packages for deep learning problems are compared.
Similar content being viewed by others
References
G. E. Hinton, “Learning multiple layers of representation,” Trends Cognitive Sci. 11, 428–434 (2007).
J. Schmidhuber, Deep learning in neural networks: an overview. http://arxivorg/abs/1404.7828
Resources and pointers to information about Deep Learning. http://deeplearningnet
D. P. Vetrov, “Machine learning: current state and perspectives,” in Proc. of RCDL (Yaroslavl, 2013), Vol. 1, pp. 21–28.
ImageNet. http://wwwimage-netorg
PASCAL Visual Object Challenge. http://pascallinecssotonacuk/challenges/VOC
C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, “Visual categorization with bags of keypoints,” in Proc. ECCV Int. Workshop on Statistical Learning in CV (Prague, 2004).
H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding algorithms,” in Proc. of NIPS (Vancouver, 2006), pp. 801–808.
D. Lowe, “Distinctive image features from scaleinvariant keypoints,” Int. J. Comput. Vision 60 (2), 91–110 (2004).
Y. He, K. Kavukcuoglu, Y. Wang, A. Szlam, and Y. Qi, “Unsupervised feature learning by deep sparse coding,” in Proc. of SIAM Int. Conf. on Data Mining (Philadelphia, 2014), pp. 902–910.
J. Yang, K. Yu, and T. Huang, “Supervised translationinvariant sparse coding,” in Proc. of CVPR (San Francisco, 2010), pp. 3517–3524.
Q. Zhang and B. Li, “Discriminative k-svd for dictionary learning in face recognition,” in Proc. of CVPR (San Francisco, 2010), pp. 2691–2698.
Z. Jiang, Z. Lin, and L. S. Davis, “Learning a discriminative dictionary for sparse coding via label consistent k-svd,” in Proc. of CVPR (Colorado Springs, 2011), pp. 1697–1704.
A. Coates, H. Lee, and A. Y. Ng, “An analysis of singlelayer networks in unsupervised feature learning,” in Proc. of Int. Conf. on Artificial Intelligence and Statistics (Ft. Lauderdale, FL, 2011), Vol. 15, pp. 215–223.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. of NIPS (Lake Tahoe, 2012), pp. 1097–1105.
K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep fisher networks for large-scale image classification,” in Proc. of NIPS (Lake Tahoe, 2013), pp. 163–171.
C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proc. of NIPS (Lake Tahoe, 2013), pp. 2553–2561.
D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection using deep neural networks,” in Proc. of CVPR (Columbus, OH, 2014).
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. of CVPR (Columbus, OH, 2014), pp. 580–587.
M. Hayat, M. Bennamoun, and S. An, “Learning nonlinear reconstruction models for image set classification,” in Proc. of CVPR (Columbus, OH, 2014).
M. Ranzato, C. Poultney, and S. Chopra, “Efficient learning of sparse representations with an energy-based model,” in Proc. of NIPS (Vancouver, 2006), pp. 1137–1144.
S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractive auto-encoders: explicit invariance during feature extraction,” in Proc. of ICML (Bellevue, 2011), pp. 833–840.
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol, “Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res. 11, 3371–3408 (2010).
K. Kavukcuoglu, P. Sermanet, Y-lan Boureau, K. Gregor, M. Mathieu, and Y. L. Cun, “Learning convolutional feature hierarchies for visual recognition,” in Proc. of NIPS (Vancouver, 2010), pp. 1090–1098.
P. Luo, Y. Tian, X. Wang, and X. Tan, “Switchable deep network for pedestrian detection,” in Proc. of CVPR (Columbus, OH, 2014).
H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” in Proc. of ICML (Montreal, 2009), pp. 609–616.
V. D. Kustikova, N. Yu. Zolotykh, and I. B. Meyerov, “A review of vehicle detection and tracking methods in video,” Vestn. Lobachevsky State Univ. Nizhni Novgorod, No. 5 (2), 347–357 (2012).
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans. PAMI’10 32 (9), 1627–1645 (2010).
J. Shotton, A. Blake, and R. Cipolla, “Contour-based learning for object detection,” in Proc. ICCV (Beijing, 2005), Vol. 1, pp. 503–510.
C. H. Hilario, J. M. Collado, J. M. Armingol, and A. de la Escalera, “Pyramidal image analysis for vehicle detection,” in Proc. of Intelligent Vehicles Symp. (Las Vegas, 2005), pp. 88–93.
Y. Amit, 2D Object Detection and Recognition: Models, Algorithms and Networks (MIT Press, 2002).
M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis and Machine Vision (Thomson, 2008).
Restricted Boltzmann Machines (RBMs). http://wwwdeeplearningnet/tutorial/rbmhtml. Assessed 07.08.2014.
R. Salakhutdinov and G. Hinton, Deep Boltzmann Machines, DBMs. http://wwwcstorontoedu/~fritz/absps/dbmpdf
Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng, “Building high-level features using large scale unsupervised learning,” in Proc. of ICML (Edinburgh, 2012).
Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proc. of ISCAS (Paris, 2010), pp. 253–256.
M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Weakly supervised object recognition with convolutional neural networks,” in Proc. of NIPS (Montreal, 2014).
M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks (2013). http://halinriafr/docs/00/91/11/79/PDF/paperpdf
J.R. R. Uijlings, K.E.A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vision 104 (2), 154–171 (2013).
X. Wang, M. Yang, S. Zhu, and Y. Lin, “Regionlets for generic object detection,” in Proc. of ICCV (Sydney, 2013).
K. Kavukcuoglu, M. Ranzato, R. Fergus, and Y. LeCun, “Learning invariant features through topographic filter maps,” in Proc. of CVPR (Miami, 2009), pp. 1605–1612.
R-CNN–a visual object detection system. https://githubcom/rbgirshick/rcnn
Caffe–a deep learning framework. http://caffeberkeleyvisionorg
nnForgeLibrary. http://milakovgithubio/nnForge
DeapLearnToolbox. https://githubcom/rasmusbergpalm/DeepLearnToolbox
Cuda-convnet–high-performance C++/CUDA implementation of convolutional neural networks. http://codegooglecom/p/cuda-convnet
EBLearn–a machine learning library. http://eblearnsourceforgenet
Cuda CNN Library. http://wwwmathworkscom/matlabcentral/fileexchange/24291-cnnconvolutionalneural-network-class, https://bitbucketorg/intelligenceagent/cudacnnpublic/wiki/Home
DeepMat Library. https://githubcom/kyunghyuncho/deepmat
Package Darch. http://cranr-projectorg/web/packages/darch/indexhtml
Software Environment R. http://wwwr-projectorg
Torch–a scientific computing framework. http://wwwtorchch
Theano Library. https://githubcom/Theano/Theano, http://deeplearningnet/software/theano
Lush programming language. http://lushsourceforgenet
Pylearn2–a machine learning library. http://deeplearningnet/software/pylearn2
Deepnet Library. https://githubcom/nitishsrivastava/deepnet
DeCAFFramework. https://githubcom/UCB-ICSIVision-Group/decaf-release
Cuda-convnet NYU. http://csnyuedu/~wanli/dropc
Hebel–GPU-accelerated deep learning library. https://githubcom/hannes-brt/hebel
CXXNET–a neural network toolkit. https://githubcom/antinucleon/cxxnet
Crino–a neural network library. https://githubcom/jlerouge/crino
A. Courville, J. Bergstra, and Y. Bengio, A spike and slab restricted Boltzmann machine (2011). http://jmlrorg/proceedings/papers/v31/luo13apdf
Y. He, K. Kavukcuoglu, Y. Wang, A. Szlam, and Y. Qi, “Unsupervised feature learning by deep sparse coding,” in Proc. of ICDM (Shenzhen, 2014), pp. 902–910.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, Image Net large scale visual recognition challenge. http://arxivorg/abs/1409.0575
C. Vens and F. Costa, “Random forest based feature induction,” in Proc. of ICDM (Vancouver, 2011), pp. 744–753.
V. Yu. Martyanov, A. N. Polovinkin, and E. V. Tuv, “Image classification with codebook based on decision tree ensembles,” in Proc. of Intelligent Information Processing Conf. (Guilin, 2012), pp. 480–482.
The Intel® Deep Learning Framework (IDLF). https://githubcom/01org/idlf
Scikit-neuralnetwork Library. https://githubcom/aigamedev/scikit-neuralnetwork
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper uses the materials of the report submitted at the 9th Open German-Russian Workshop on Pattern Recognition and Image Understanding, held in Koblenz, December 1–5, 2014 (OGRW-9-2014).
The article published in the original.
Pavel Nikolaevich Druzhkov Born 1989 Graduated Lobachevsky State University of Nizni Novgorod in 2012. He is a junior research of the Lobachevsky State University of Nizhni Novgorod.
Research interests: machine learning and data mining, computer vision.
Number of publications (monographs and articles): 6.
Valentina Dmitrievna Kustikova Born 1987. Graduated in 2010, Lobachevsky State University of Nizhni Novgorod. Year of dissertation completion (Candidate’s, Doctoral): 2015, Candidate of Engineering Sciences. Assistent at the Lobachevsky State University of Nizhni Novgorod.
Research interests: computer vision, machine learning, parallel computing.
Number of publications (monographs and articles): 8.
Rights and permissions
About this article
Cite this article
Druzhkov, P.N., Kustikova, V.D. A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit. Image Anal. 26, 9–15 (2016). https://doi.org/10.1134/S1054661816010065
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661816010065