Skip to main content
Log in

Image categorization with resource constraints: introduction, challenges and advances

  • Review Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

As one of the most classic fields in computer vision, image categorization has attracted widespread interests. Numerous algorithms have been proposed in the community, and many of them have advanced the state-of-the-art. However, most existing algorithms are designed without consideration for the supply of computing resources. Therefore, when dealing with resource constrained tasks, these algorithms will fail to give satisfactory results. In this paper, we provide a comprehensive and in-depth introduction of recent developments of the research in image categorization with resource constraints. While a large portion is based on our own work, we will also give a brief description of other elegant algorithms. Furthermore, we make an investigation into the recent developments of deep neural networks, with a focus on resource constrained deep nets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Viola P, Jones M J. Robust real-time face detection. International Journal of Computer Vision, 2004, 57(2): 137–154

    Article  Google Scholar 

  2. Wu J, Liu N, Geyer C, Rehg M J. C4: a real-time object detection framework. IEEE Transactions on Image Processing, 2013, 22(10): 4096–4107

    Article  MathSciNet  Google Scholar 

  3. Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2006, 2169–2178

    Google Scholar 

  4. Datta R, Joshi D, Li J, Wang Z J. Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): 5

    Article  Google Scholar 

  5. Breitenstein M D, Reichlin F, Leibe B, Koller-Meier E, Van Gool L. Robust tracking-by-detection using a detector confidence particle filter. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 1515–1522

    Google Scholar 

  6. Perronnin F, Dance C. Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2007, 1–8

    Google Scholar 

  7. Sánchez J, Perronnin F, Mensink T, Verbeek J. Image classification with the fisher vector: theory and practice. International Journal of Computer Vision, 2013, 105(3): 222–245

    Article  MathSciNet  MATH  Google Scholar 

  8. Arandjelovic R, Zisserman A. All about VLAD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 1578–1585

    Google Scholar 

  9. Wu J, Rehg J M. CENTRIST: a visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8): 1489–1501

    Article  Google Scholar 

  10. Wu J, Yang H. Linear regression-based efficient SVM learning for large-scale classification. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(10): 2357–2369

    Article  MathSciNet  Google Scholar 

  11. Perronnin F, Sánchez J, Liu Y. Large-scale image categorization with explicit data embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010, 2297–2304

    Google Scholar 

  12. Vedaldi A, Zisserman A. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 480–492

    Article  Google Scholar 

  13. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems Conference. 2012, 1097–1105

    Google Scholar 

  14. Deng J, Berg A C, Li K, Li F-F. What does classifying more than 10,000 image categories tell us? In: Proceedings of the 11th European Conference on Computer Vision. 2010, 71–84

    Google Scholar 

  15. Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T. Largescale image classification: fast feature extraction and SVM training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 1689–1696

    Google Scholar 

  16. Han S, Mao H, Dally W J. Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proceedings of the International Conference on Learning Representations. 2016

    Google Scholar 

  17. Gong Y, Liu L, Yang M, Bourdev L. Compressing deep convolutional networks using vector quantization. 2014, arXiv preprint arXiv: 1412.6115

    Google Scholar 

  18. Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 2285–2294

    Google Scholar 

  19. Hinton G, Oriol V, Jeff D. Distilling the knowledge in a neural network. In: Proceedings of the Neural Information Processing Systems Workshop. 2014

    Google Scholar 

  20. Hsieh C-J, Chang K-W, Lin C-J, Keerthi S S, Sundararajan S. A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 408–415

    Google Scholar 

  21. Yuan G X, Ho C H, Lin C J. Recent advances of large-scale linear classification. Proceedings of the IEEE, 2012, 100(9): 2584–2603

    Article  Google Scholar 

  22. Shalev-Shwartz S, Singer Y, Srebro N, Cotter A. Pegasos: primal estimated sub-gradient solver for SVM. Mathematical Programming, 2011, 127(1): 3–30

    Article  MathSciNet  MATH  Google Scholar 

  23. Wu J. Power mean SVM for large scale visual classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2344–2351

    Google Scholar 

  24. Williams C Seeger M. Using the Nyström method to speed up kernel machines. In: Proceedings of the 14th Annual Conference on Neural Information Processing Systems. 2001, 682–688

    Google Scholar 

  25. Djuric N, Lan L, Vucetic S, Wang Z. BudgetedSVM: a toolbox for scalable SVM approximations. The Journal of Machine Learning Research, 2013, 14(1): 3813–3817

    MathSciNet  MATH  Google Scholar 

  26. Odone F, Barla A, Verri A. Building kernels from binary strings for image matching. IEEE Transactions on Image Processing, 2005, 14(2): 169–180

    Article  MathSciNet  Google Scholar 

  27. Wu J. A fast dual method for HIK SVM learning. In: Proceedings of the 11th European Conference on Computer Vision. 2010, 552–565

    Google Scholar 

  28. Zhang Y, Wu J, Cai J, Lin W. Flexible image similarity computation using hyper-spatial matching. IEEE Transactions on Image Processing, 2014, 23(9): 4112–4125

    Article  MathSciNet  Google Scholar 

  29. Deshpande A, Rademacher L, Vempala S, Wang G. Matrix approximation and projective clustering via volume sampling. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm. 2006, 1117–1126

    Google Scholar 

  30. Zhang K, Tsang I W, Kwok J T. Improved Nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 1232–1239

    Google Scholar 

  31. Kumar S, Mohri M, Talwalkar A. Sampling methods for the Nyström method. The Journal ofMachine Learning Research, 2012, 13(1): 981–1006

    MATH  Google Scholar 

  32. Yang H, Wu J. Reduced heteroscedasticity linear regression for Nyström approximation. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1841–1847

    Google Scholar 

  33. Jiao L, Bo L, Wang L. Fast sparse approximation for least squares support vector machine. IEEE Transactions on Neural Networks, 2007, 18(3): 685–697

    Article  Google Scholar 

  34. Li F, Lebanon G, Sminchisescu C. Chebyshev approximations to the histogram 2 kernel. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2424–2431

    Google Scholar 

  35. Rahimi A Recht B. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007, 20: 1177–1184

    Google Scholar 

  36. Maji S, Berg A C. Max-margin additive classifiers for detection. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 40–47

    Google Scholar 

  37. Csurka G, Dance C, Fan L, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Proceedings of the ECCV International Workshop on Statistical Learning in Computer Vision. 2004, 1–16

    Google Scholar 

  38. Chang C, Lin C. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27

    Article  Google Scholar 

  39. Wu J. Efficient HIK SVM learning for image classification. IEEE Transactions on Image Processing, 2012, 21(10): 4442–4453

    Article  MathSciNet  Google Scholar 

  40. Wu J, Yang H. Practical large scale classification with additive kernels. In: Proceedings of the Asian Conference on Machine Learning. 2012, 523–538

    Google Scholar 

  41. Bay H, Tuytelaars T, Van Gool L. Surf: speeded up robust features. In: Proceedings of the 9th European Conference on Computer Vision. 2006, 404–417

    Google Scholar 

  42. Rublee E, Rabaud V, Konolige K, Bradski G. ORB: an efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision. 2011, 2564–2571

    Google Scholar 

  43. Sivic J, Zisserman A. Video google: a text retrieval approach to object matching in videos. In: Proceedings of the 9th IEEE International Conference on Computer Vision. 2003, 1470–1477

    Chapter  Google Scholar 

  44. Winn J, Criminisi A, Minka T. Object categorization by learned universal visual dictionary. In: Proceedings of the 10th IEEE International Conference on Computer Vision. 2005, 1800–1807

    Google Scholar 

  45. Perronnin F. Universal and adapted vocabularies for generic visual categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7): 1243–1256

    Article  Google Scholar 

  46. Wu J, Tan W C, Rehg J M. Efficient and effective visual codebook generation using additive kernels. The Journal of Machine Learning Research, 2011, 12: 3097–3118

    MathSciNet  MATH  Google Scholar 

  47. Schölkopf B, Platt J C, Shawe-Taylor J, Smola A J, Williamson R C. Estimating the support of a high-dimensional distribution. Neural computation, 2001, 13(7): 1443–1471

    Article  MATH  Google Scholar 

  48. Muja M, Lowe D G. Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the International Conference on Computer Vision Theory and Applications. 2009, 331–340

    Google Scholar 

  49. Liu T, Moore A W, Yang K. An investigation of practical approximate nearest neighbor algorithms. In: Proceedings of the Neural Information Processing Systems Conference. 2005, 825–832

    Google Scholar 

  50. Zhang Y, Wu J X, Lin W Y. Exclusive visual descriptor quantization. In: Proceedings of the 11th Asian Conference on Computer Vision. 2012, 408–421

    Google Scholar 

  51. Moosmann F, Triggs B, Jurie F. Fast discriminative visual codebooks using randomized clustering forests. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems. 2006, 985–992

    Google Scholar 

  52. Moosmann F, Nowak E, Jurie F. Randomized clustering forests for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(9): 1632–1646

    Article  Google Scholar 

  53. Binder A, Wojcikiewicz W, Müller C, Kawanabe M. A hybrid supervised-unsupervised vocabulary generation algorithm for visual concept recognition. In: Proceedings of the 10th Asian Conference on Computer Vision. 2010, 95–108

    Google Scholar 

  54. Uijlings J R R, Smeulders A W M, Scha R J. Real-time visual concept classification. IEEE Transactions on Multimedia, 2010, 12(7): 665–681

    Article  Google Scholar 

  55. Zabih R, Woodfill J. Non-parametric local transforms for computing visual correspondence. In: Proceedings of the 3rd European Conference on Computer Vision. 1994, 151–158

    Google Scholar 

  56. Wu J, Rehg J M. Where am I: place instance and category recognition using spatial PACT. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2008, 1–8

    Google Scholar 

  57. Xiao Y, Wu J, Yuan J. mCENTRIST: a multi-channel feature generation mechanism for scene categorization. IEEE Transactions on Image Processing, 2014, 23(2): 823–836

    Article  MathSciNet  Google Scholar 

  58. Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(1): 117–128

    Article  Google Scholar 

  59. Norouzi M, Fleet D J. Cartesian k-means. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 3017–3024

    Google Scholar 

  60. Ge T, He K, Ke Q, Sun J. Optimized product quantization for approximate nearest neighbor search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2946–2953

    Google Scholar 

  61. Gong Y, Lazebnik S. Iterative quantization: a procrustean approach to learning binary codes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011, 817–824

    Google Scholar 

  62. Gong Y, Kumar S, Rowley H A, Lazebnik S. Learning binary codes for high-dimensional data using bilinear projections. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013, 484–491

    Google Scholar 

  63. Schwartz W R, Kembhavi A. Human detection using partial least squares analysis. In: Proceedings of the 12th IEEE International Conference on Computer Vision. 2009, 24–31

    Google Scholar 

  64. Zhang Y, Wu J, Cai J. Compact representation for image classification: to choose or to compress? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014, 907–914

    Google Scholar 

  65. Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226–1238

    Article  Google Scholar 

  66. Fleuret F. Fast binary feature selection with conditional mutual information. The Journal of Machine Learning Research, 2004, 5: 1531–1555

    MathSciNet  MATH  Google Scholar 

  67. Sindhwani V, Sainath T N, Kumar S. Structured transforms for smallfootprint deep learning. In: Proceedings of the Neural Information Processing Systems Conference. 2015, 3070–3078

    Google Scholar 

  68. Denton E L, Zaremba W, Bruna J, Le Cun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In: Proceedings of the Neural Information Processing Systems Conference. 2014, 1269–1277

    Google Scholar 

  69. Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. 2014, arXiv preprint arXiv: 1405.3866

    Google Scholar 

  70. Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th ACM International Conference on Machine Learning. 2007, 473–480

    Google Scholar 

  71. Denil M, Shakibi B, Dinh L. Predicting parameters in deep learning. In: Proceedings of the Neural Information Processing Systems Conference. 2013, 2148–2156

    Google Scholar 

  72. Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In: Proceedings of the Neural Information Processing Systems Conference. 2015, 1135–1143

    Google Scholar 

  73. Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. 2012, arXiv preprint arXiv: 1207.0580

    MATH  Google Scholar 

  74. Luo P, Zhu Z, Liu Z, Wang X, Tang X. Face model compression by distilling knowledge from neurons. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2016

    Google Scholar 

  75. Ba J, Caruana R. Do deep nets really need to be deep? In: Proceedings of the Neural Information Processing Systems Conference. 2014, 2654–2662

  76. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 1–9

    Google Scholar 

  77. Arora S, Bhaskara A, Ge R, Ma T. Provable bounds for learning some deep representations. 2013, arXiv preprint arXiv: 1310.6343

    Google Scholar 

  78. Iandola F N, Moskewicz M W, Ashraf K, Han S, Dally W J, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. 2016, arXiv preprint arXiv: 1602.07360

    Google Scholar 

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 61422203).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxin Wu.

Additional information

Jian-Hao Luo received his BS degree in the College of Computer Science and Technology from Jilin University, China in 2015. He is currently working toward the PhD degree in the Department of Computer Science and Technology, Nanjing University, China. His research interests are computer vision and machine learning.

Wang Zhou received his BS degree in the School of Computer Science and Engineering from the University of Electronic Science and Technology of China in 2014. He is currently a graduate student in the Department of Computer Science and Technology, Nanjing University, China. His research interests include computer vision and machine learning.

Jianxin Wu received his BS and MS degrees in computer science from Nanjing University, China and his PhD degree in computer science from the Georgia Institute of Technology, USA. He is currently a professor in the Department of Computer Science and Technology at Nanjing University, and is associated with the National Key Laboratory for Novel Software Technology, China. He was an assistant professor in the Nanyang Technological University, Singapore and has served as an area chair for ICCV 2015 and senior PC member for AAAI 2016. His research interests are computer vision and machine learning. He is an awardee of the NSFC Excellent Young Scholars Program in 2014.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, JH., Zhou, W. & Wu, J. Image categorization with resource constraints: introduction, challenges and advances. Front. Comput. Sci. 11, 13–26 (2017). https://doi.org/10.1007/s11704-016-5514-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-016-5514-6

Keywords

Navigation