Skip to main content
Log in

Minimum volume simplex-based scene representation and attribute recognition with feature fusion

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Scene attribute recognition is to identify attribute labels of one scene image based on scene representation for deeper semantic understanding of scenes. In the past decades, numerous algorithms for scene representation have been proposed by feature engineering or deep convolutional neural network. For models based on only one kind of image feature, it is still difficult to learn the representation of multiple attributes from local image region. For models based on deep learning, despite multi-label can be directly used for learning attributes representation, huge training data are usually necessary to build the multi-label model. In this paper, we investigate the problem by the way of scene representation modeling with multi-feature and non-deep learning. Firstly, we introduce linear mixing model (LMM) for scene image modeling, then present a novel approach, referred to as the mini-batch minimum simplex estimation (MMSE), for attribute-based scene representation learning from highly complex image data. Finally, a two-stage multi-feature fusion method is proposed to further improve the feature representation for scene attribute recognition. The proposed method takes advantage of the fast convergence of nonnegative matrix factorization (NMF) schemes, and at the same time using mini-batch to speed up the computation for large-scale scene dataset. The experimental results based on real image scene demonstrate that the proposed method outperforms several other advanced scene attribute recognition approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Yin G, Sheng L, Liu B, Yu N, Wang X, Shao J (2019) Context and attribute grounded dense captioning. In: 2019 IEEE Conference on computer vision and pattern recognition, CVPR 2019, 15–20

  2. Choi S, Kim JT, Choo J (2019) Cars Can’t Fly up in the sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks. In: 2019 IEEE Conference on computer vision and pattern recognition, CVPR 2019, 15–20

  3. Zhang R, Lin L, Wang G, Wang M, Zuo W (2019) Hierarchical scene parsing by weakly supervised learning with image descriptions. IEEE Trans Pattern Anal Mach Intell 41(3):596–610

    Article  Google Scholar 

  4. Sulistiyo AMD, Kawanishi Y, Deguchi D, Hirayama T, Ide I, Zheng JY, Murase H (2018) Attribute-aware Semantic Segmentation of Road Scenes for Understanding Pedestrian Orientations. In: IEEE 21st international conference on intelligent transportation systems, ITSC

  5. Vitor GB, Victorino AC, Ferreira JV (2021) Modeling evidential grids using semantic context information for dynamic scene perception. Knowledge-Based Systems 215:106777

    Article  Google Scholar 

  6. Xie L, Lee F, Liu L, Kotanic K, Chen Q (2020) Scene recognition: A comprehensive survey. Pattern Recognit 102:107205

    Article  Google Scholar 

  7. Zeng H, Song X, Chen G (2020) Learning scene attribute for scene recognition. IEEE IEEE Trans Multimed 22(6):1519– 1530

    Article  Google Scholar 

  8. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42(3):145–175

    Article  MATH  Google Scholar 

  9. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Conference on computer vision and pattern recognition, CVPR 2006, 17–22

  10. Patterson G, Xu C, Su H, Hays J (2014) The SUN attribute database: beyond categories for deeper scene understanding. Int J Comput Vis 108:59–81

    Article  Google Scholar 

  11. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, in: NIPS’01, MIT Press, pp 681–687

  12. Zhang M-L, Zhou Z-H (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  MATH  Google Scholar 

  13. Chen L, Zhan W, Tian W, He Y, Zou Q (2019) Deep integration: a Multi-Label architecture for road scene recognition. IEEE Trans Image Process 28(10):4883–4898

    Article  MathSciNet  MATH  Google Scholar 

  14. Song L, Liu J, Qian B, Sun M, Yang K, Sun M, Abbas S (2018) A deep multi-modal CNN for multi-instance multi-label image classification. IEEE Trans Image Process 27(12):6025–6038

    Article  MathSciNet  Google Scholar 

  15. Khan N, Chaudhuri U, Banerjee B, Chaudhuri S (2019) Graph convolutional network for multi-label VHR remote sensing scene recognition. Neurocomputing 357:36–46

    Article  Google Scholar 

  16. Wang S, Wnag Y, Zhu SC (2015) Learning hierarchical space tiling for scene modeling, parsing and attribute tagging. IEEE Trans Pattern Anal Mach Intell 37(12):2478–2491

    Article  Google Scholar 

  17. Dalal N, Triggs B (2005) Histogram of oriented gradient object detection. In: 2005 IEEE Conference on computer vision and pattern recognition, CVPR

  18. Lalonde J-F, Hoiem D, Efros AA, Rother C, Winn J, Criminisi A (2007) Photo clip art. ACM Transactions on Graphics 26(3):2007

    Article  Google Scholar 

  19. Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: 2007 IEEE Conference on computer vision and pattern recognition, CVPR

  20. Zhu J, Wu T, Zhu S-C, Yang X, Zhang W (2016) A reconfigurable tangram model for scene representation and categorization. IEEE Trans Image Process 25(1):150–166

    Article  MathSciNet  MATH  Google Scholar 

  21. Tung F, Little JJ (2015) Improving scene attribute recognition using web-scale object detectors. Comput Vis Image Underst 138:86–91

    Article  Google Scholar 

  22. Chen X, Shrivastava A, Gupta A (2013) NEIL: Extracting visual knowledge from web data. In: IEEE International conference on computer vision

  23. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: 2014 British machine vision conference

  24. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, CVPR

  25. Wang L, Guo S, Huang W, Xiong Y, Qiao Y (2017) Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. IEEE Trans Image Process 26(4):2055–2068

    Article  MathSciNet  MATH  Google Scholar 

  26. Qi K, Yang C, Shen S (2021) A multi-level improved circle pooling for scene classification of high-resolution remote sensing imagery. Neurocomputing

  27. Yuan X, Qiao Z, Meyarian A (2021) Scale attentive network for scene recognition. Neurocomputing

  28. Lin C, Lee F, Chen Q (2022) Scene recognition using multiple representation network. Applied Soft Computing

  29. Zou Z, Liu W, Xing W (2021) AdaNFF: A new method for adaptive nonnegative multi-feature fusion to scene classification. Pattern Recognit

  30. Nascimento JMP, Bioucas-Dias JM (2005) Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans Geosci Remote Sens 43(4):898–910

    Article  Google Scholar 

  31. Li J, Agathos A, Zaharie D, Bioucas-Dias JM, Plaza A, Li X (2015) Minimum volume simplex analysis: a fast algorithm for linear hyperspectral unmixing. IEEE Trans Geosci Remote Sens 53(9):5067–5082

    Article  Google Scholar 

  32. Lin C-H, Chi C-Y, Wang Y-H, Chan T-H (2016) A fast hyperplane-based minimum-volume enclosing simplex algorithm for blind hyper-spectral unmixing. IEEE Transactions on Signal Processing 64(8):1946–196

    Article  MathSciNet  MATH  Google Scholar 

  33. Zhang S, Agathos A, Li J (2017) Robust minimum volume simplex analysis for hyperspectral unmixing. IEEE Trans Geosci Remote Sens 55(11):6431–6439

    Article  Google Scholar 

  34. Fu X, Huang K, Yang B, Ma W-K, Ni D (2016) sidiropoulos, Robust volume minimization-based matrix factorization for remote sensing and document clustering. IEEE Trans Signal Process 64(23):6254–6268

    Article  MathSciNet  MATH  Google Scholar 

  35. Leplat V, Ang AMS, Gillis N (2019) Minimum-volume rank-deficient nonnegative matrix factorizations. ICASSP, pp 3402–3406

  36. Marrinan T, Gillis N (2020) Hyperspectral unmixing with rare endmembers via minimax nonnegative matrix factorization. EUSIPCO, pp 1015–1019

  37. Wang X, Zhong Y, Zhang L, Xu Y (2019) Blind hyperspectral unmixing considering the adjacency effect. IEEE Trans Geosci Remote Sens 57(9):6633–6649

    Article  Google Scholar 

  38. Mangai UG, Samanta S, Das S, Roy PC (2010) A survey of decision fusion and feature fusion strategies for pattern classification. IETE Tech Rev 27(4):293–307

    Article  Google Scholar 

  39. Charte D, Charte F, Garcia S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: Taxonomy, models, software and guidelines. Information Fusion 44:78–96

    Article  Google Scholar 

  40. Ma AJ, Yuen PC, Lai JH (2013) Linear dependency modeling for classifier fusion and feature combination. IEEE Trans Pattern Anal Mach Intell 35(5):1135–1148

    Article  Google Scholar 

  41. Baggenstoss PM (2016) Maximum entropy feature fusion. In: International conference on information fusion, pp 1163–1169

  42. Liu Y, Tang A, Cai F, Ren P, Sun Z (2019) Multi-feature based Question–Answerer Model Matching for predicting response time in CQA. Knowledge-Based Systems 182:104794

    Article  Google Scholar 

  43. Shekhar S, Patel VM, Nasrabadi NM, Chellapa R (2014) Joint sparse representation for robust multimodal biometrics recognition. IEEE Trans Pattern Anal Mach Intell 36(1):113–126

    Article  Google Scholar 

  44. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2281

    Article  Google Scholar 

  45. Lin CJ (2007) Projected gradient methods for non-negative matrix factorization. Neural Comput 19(10):2756–2779

    Article  MathSciNet  MATH  Google Scholar 

  46. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: 2009 IEEE Conference on computer vision and pattern recognition, CVPR

  47. Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A (2017) Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence

  48. Xie L, Lee F, Liu L (2020) Hierarchical coding of convolutional features for scene recognition. IEEE Transactions on Multimedia 22(5):1182–1192

    Article  Google Scholar 

  49. Chenga X, Lub J, Fengb J, Yuan B, Zhou J (2018) Scene recognition with objectness. Pattern Recogn 74:474–487

    Article  Google Scholar 

  50. Liu Y, Chen Q, Chen W, Wassell I (2018) Dictionary learning inspired deep network for scene recognition. In: Proceedings of AAAI conference on artificial intelligence, pp 7178–7185

Download references

Acknowledgements

This research is partially supported by the Beijing Natural Science Foundation (No.4212025), National Natural Science Foundation of China (No.61876018, No.61976017).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weibin Liu.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, Z., Liu, W., Xing, W. et al. Minimum volume simplex-based scene representation and attribute recognition with feature fusion. Appl Intell 53, 8959–8977 (2023). https://doi.org/10.1007/s10489-022-03697-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03697-9

Keywords

Navigation