Skip to main content
Log in

Learning region-wise deep feature representation for image analysis

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Effective feature representation plays an important role in image analysis tasks. In recent years, deep features, instead of hand-crafted features, have become the mainstream of the representation in image analysis tasks. However, the existing deep learning methods always extract feature representations from the whole image directly. Such strategies concentrate on extracting global features, and tend to fail in capturing local geometric invariance and introduce noise information from regions of not interest. In this paper, we propose a novel region-wise deep feature extraction framework for promoting the local geometric invariance and reducing noise information. In our algorithm, object proposal is adopted to generate a set of foreground object bounding boxes, from which the pre-trained convolutional neural network model is adopted to extract region-wise deep features. Then, an improved vector of locally aggregated descriptors strategy with weighted multi-neighbor assignment is proposed to encode the local region-wise feature representations. The final feature representation is not restricted to the classification task, and can also be further quantized to hash codes for large-scale image retrieval. Extensive experiments conducted on publicly available datasets demonstrate the promising performance of our work against the state-of-the-art methods in both image retrieval and classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Alexe B, Deselaers T, Ferrari V (2010) What is an object? In: IEEE conference on computer vision and pattern recognition, pp 73–80

  • Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Found Comput Sci 51:117–122

    Google Scholar 

  • Arandjelovic R, Gronát P, Torii A, Pajdla T, Sivic J (2016) Netvlad: CNN architecture for weakly supervised place recognition. In: IEEE conference on computer vision and pattern recognition, pp 5297–5307

  • Babenko A, Lempitsky VS (2015) Aggregating deep convolutional features for image retrieval. CoRR abs. arxiv:1510.07493

  • Barat C, Ducottet C (2016) String representations and distances in deep convolutional neural networks for image classification. Pattern Recogn 54:104–115

    Article  ADS  Google Scholar 

  • Cai L, Zhu J, Zeng H, Chen J, Cai C, Ma K (2018) Hog-assisted deep feature learning for pedestrian gender recognition. J Franklin Inst 355(4):1991–2008

    Article  Google Scholar 

  • Cao Z, Long M, Wang J, Yu PS (2017a) Hashnet: deep learning to hash by continuation. CoRR abs. arxiv:1702.00758

  • Cao Z, Long M, Wang J, Yu PS (2017b) Hashnet: deep learning to hash by continuation. In: ICCV, pp 5609–5618

  • Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern, pp 886–893

  • Dixit M, Chen S, Gao D, Rasiwasia N, Vasconcelos N (2015) Scene classification with semantic fisher vectors. In: IEEE conference on computer vision and pattern recognition, pp 2974–2983

  • Dollár P, Zitnick CL (2015) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570

    Article  PubMed  Google Scholar 

  • Dollár P, Zitnick CL (2013) Structured forests for fast edge detection. In: IEEE international conference on computer vision, pp 1841–1848

  • Durand T, Mordan T, Thome N, Cord M (2017) WILDCAT: weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In: IEEE conference on computer vision and pattern recognition, pp 5957–5966

  • Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    Google Scholar 

  • Girshick RB (2015) Fast R-CNN. In: IEEE international conference on computer vision, pp 1440–1448

  • Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Article  PubMed  Google Scholar 

  • Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407

  • Hoang T, Do T, Tan DL, Cheung N (2017) Selective deep convolutional features for image retrieval. CoRR abs. arxiv:1707.00809

  • Jegou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE conference on conference on computer vision and pattern recognition, pp 3304–3311

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1106–1114

  • Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: IEEE conference on computer vision and pattern recognition, pp 3270–3278

  • Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition, pp 2169–2178

  • Li P, Liu Y, Liu G, Guo M, Pan Z (2016a) A robust local sparse coding method for image classification with histogram intersection kernel. Neurocomputing 184:36–42

    Article  Google Scholar 

  • Li Y, Li W, Mahadevan V, Vasconcelos N (2016b) VLAD3: encoding dynamics of deep features for action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1951–1960

  • Lin K, Lu J, Chen C, Zhou J (2016) Learning compact binary descriptors with unsupervised deep neural networks. In: IEEE conference on computer vision and pattern recognition, pp 1183–1192

  • Liu P, Liu G, Guo M, Li P (2015) Image classification based on non-negative locality-constrained linear coding. Acta Autom Sin 41(7):1235–1243

    Google Scholar 

  • Liu Y, Zhang X, Zhu X, Guan Q, Zhao X (2017) Listnet-based object proposals ranking. Neurocomputing 267:182–194

    Article  Google Scholar 

  • Liu L, Shen C, Wang L, van den Hengel A, Wang C (2014) Encoding high dimensional local features by sparse coding based fisher vectors. In: Advances in neural information processing systems, pp 1143–1151

  • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  • Maninis K, Pont-Tuset J, Arbelaez P, Gool LV (2018) Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE Trans Pattern Anal Mach Intell 40(4):819–833

    Article  PubMed  Google Scholar 

  • Ng JY, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp 53–61

  • Peng X, Wang L, Qiao Y, Peng Q (2014) Boosting VLAD with supervised dictionary learning and high-order statistics. In: Computer vision—ECCV 2014–13th European conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part III, pp 660–674

  • Pont-Tuset J, Arbelaez P, Barron JT, Marqués F, Malik J (2017) Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans Pattern Anal Mach Intell 39(1):128–140

    Article  PubMed  Google Scholar 

  • Rahtu E, Kannala J, Blaschko MB (2011) Learning a category independent object detection cascade. In: IEEE international conference on computer vision, pp 1052–1059

  • Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014a) CNN features off-the-shelf: An astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition, pp 512–519

  • Razavian AS, Sullivan J, Maki A, Carlsson S (2014b) Visual instance retrieval with deep convolutional networks. CoRR abs. arxiv:1412.6574

  • Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: IEEE conference on computer vision and pattern recognition, pp 37–45

  • Simonyan K, Vedaldi A, Zisserman A (2013) Deep fisher networks for large-scale image classification. In: Advances in neural information processing systems, pp 163–171

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR abs. arxiv:1409.1556

  • Tsai T, Huang Y, Chiang T (2006) Image retrieval based on dominant texture features. In: IEEE international symposium on industrial electronics, pp 441–446

  • Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  • Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence, pp 2156–2162

  • Yang J, Liu J, Dai Q (2015) An improved bag-of-words framework for remote sensing image retrieval in large-scale image databases. Int J Digit Earth 8(4):273–292

    Article  Google Scholar 

  • Yang H, Lin K, Chen C (2018) Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):437–451

    Article  PubMed  Google Scholar 

  • Zhang XY, Wang S, Zhu X, Yun X, Wu G (2015) Update vs. upgrade: modeling with indeterminate multi-class active learning. Neurocomputing 162:163–170

    Article  Google Scholar 

  • Zhang J, Peng Y, Zhang J (2016a) Query-adaptive image retrieval by deep weighted hashing. CoRR abs. arxiv:1612.02541

  • Zhang J, Peng Y, Zhang J (2016b) SSDH: semi-supervised deep hashing for large scale image retrieval. CoRR abs. arxiv:1607.08477

  • Zhu X, Liu J, Wang J, Li C, Lu H (2014) Sparse representation for robust abnormality detection in crowded scenes. Pattern Recogn 47(5):1791–1799

    Article  ADS  Google Scholar 

  • Zhu J, Liao S, Lei Z, Li SZ (2017) Multi-label convolutional neural network based pedestrian attribute classification. Image Vis Comput 58:224–229

    Article  Google Scholar 

  • Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision, pp 391–405

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (2017YFB1401000) and National Natural Science Foundation of China (61501457, 61602517). The corresponding authors are Peng Li and Xiao-Yu Zhang, who contribute equally to this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaobin Zhu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peng Li and Xiao-Yu Zhang are contributed equally to this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Wang, Q., Li, P. et al. Learning region-wise deep feature representation for image analysis. J Ambient Intell Human Comput 14, 14775–14784 (2023). https://doi.org/10.1007/s12652-018-0894-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-0894-0

Keywords

Navigation