Skip to main content

Advertisement

Log in

Image aesthetics assessment using composite features from transformer and CNN

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

As a popular research problem in computational aesthetics, image aesthetic assessment has many important applications in image editing, retrieval, and recommendation. However, the existing mainstream CNN-based image aesthetic assessment methods are difficult to obtain the global aesthetic attributes of images well. To this end, we propose a two-stream image aesthetic assessment model that couples Transformer and CNN features. We use the traditional CNN network to extract the image’s local aesthetic feature in the first stream, apply the superpixel algorithm to segment the image, and then feed the segmented image region into the Transformer network to learn the image’s aesthetic global features in the second stream. Finally, the features learned by Transformer and CNN are fused to achieve the image aesthetic assessment. The experimental results on the AVA dataset show that our proposed method can obtain local and global aesthetic information on images, which enables the model to learn richer aesthetic information, and the combination of whole and part is more in line with human aesthetic characteristics. Our proposed model achieves an accuracy of 84.5% in the classification task, achieving optimal performance compared to existing methods and good performance in the other two tasks (Score Regression and Distribution).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  2. She, D., Lai, Y.-K., Yi, G., Xu, K.: Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8471–8480 (2021). https://doi.org/10.1109/CVPR46437.2021.00837

  3. Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In Presented at the NIPS June 12 (2017)

  4. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv. (2020)

  5. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv (2020)

  6. Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 7242–7252 (2021). https://doi.org/10.1109/ICCV48922.2021.00717

  7. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282 (2012). https://doi.org/10.1109/TPAMI.2012.120

    Article  Google Scholar 

  8. Lu, X., Lin, Z., Jin, H., Yang, J., Wang, J.Z.: RAPID: rating pictorial aesthetics using deep learning. In: Proceedings of the 22nd ACM International Conference on Multimedia. pp. 457–466 (2014). https://doi.org/10.1145/2647868.2654927

  9. Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27, 3998–4011 (2018). https://doi.org/10.1109/TIP.2018.2831899

    Article  MathSciNet  MATH  Google Scholar 

  10. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv (2017)

  11. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR. (2014)

  12. Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

  13. Wu, O., Hu, W., Gao, J: Learning to predict the perceived visual quality of photos. In: 2011 International Conference on Computer Vision. pp. 225–232 (2011). https://doi.org/10.1109/ICCV.2011.6126246

  14. Kong, S., Shen, X., Lin, Z., Mech, R., Fowlkes, C.: Photo aesthetics ranking network with attributes and content adaptation. Vol. 9905, pp. 662–679 (2016). https://doi.org/10.1007/978-3-319-46448-0_40

  15. Gao, F., Li, Z., Yu, J., Yu, J., Huang, Q., Tian, Q.: Style-adaptive photo aesthetic rating via convolutional neural networks and multi-task learning. Neurocomputing 395, 247–254 (2020). https://doi.org/10.1016/j.neucom.2018.06.099

    Article  Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  17. Murray, N., Marchesotti, L., Perronnin, F.: AVA: a large-scale database for aesthetic visual analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2408–2415 (2012). https://doi.org/10.1109/CVPR.2012.6247954

  18. Yang, Y., Xu, L., Li, L., Qie, N., Li, Y., Zhang, P., Guo, Y.: Personalized image aesthetics assessment with rich attributes. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19829–19837 (2022). https://doi.org/10.1109/CVPR52688.2022.01924

  19. Zhu, H., Zhou, Y., Li, L., Li, Y., Guo, Y.: Learning personalized image aesthetics from subjective and objective attributes. IEEE Trans. Multimedia 25, 179–190 (2023). https://doi.org/10.1109/TMM.2021.3123468

    Article  Google Scholar 

  20. Zhu, H., Li, L., Wu, J., Zhao, S., Ding, G., Shi, G.: Personalized image aesthetics assessment via meta-learning with bilevel gradient optimization. IEEE Trans. Cybern. 52, 1798–1811 (2022). https://doi.org/10.1109/TCYB.2020.2984670

    Article  Google Scholar 

  21. Liu, D., Puri, R., Kamath, N., Bhattacharya, S.: Composition-aware image aesthetics assessment. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 3558–3567 (2020). https://doi.org/10.1109/WACV45572.2020.9093412

  22. Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: local features coupling global representations for visual recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 357–366 (2021). https://doi.org/10.1109/ICCV48922.2021.00042

  23. Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: bottleneck transformers for visual recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16514–16524 (2021). https://doi.org/10.1109/CVPR46437.2021.01625

  24. Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., Xu, C.: CMT: Convolutional neural networks meet vision transformers. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12165–12175 (2022). https://doi.org/10.1109/CVPR52688.2022.01186

  25. Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: CvT: introducing convolutions to vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 22–31 (2021). https://doi.org/10.1109/ICCV48922.2021.00009

  26. Li, K., Wang, Y., Gao, P., Song, G., Liu, Y., Li, H., Qiao, Y.: UniFormer: unified transformer for efficient spatiotemporal representation learning. arXiv (2022)

  27. Deng, J., Dong, W., Socher, R., Li, L.-J., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  28. Achanta, R., Susstrunk, S.: Superpixels and polygons using simple non-iterative clustering. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4895–4904 (2017). https://doi.org/10.1109/CVPR.2017.520

  29. Van Den Bergh, M., Boix, X., Roig, G., Van Gool, L.: SEEDS: superpixels extracted via energy-driven sampling. Int. J. Comput. Vis. 111, 298–314 (2015). https://doi.org/10.1007/s11263-014-0744-2

    Article  MathSciNet  Google Scholar 

  30. Yao, J., Boben, M., Fidler, S., Urtasun, R.: Real-time coarse-to-fine topologically preserving segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2947–2955 (2015). https://doi.org/10.1109/CVPR.2015.7298913

  31. Li, L., Zhu, H., Zhao, S., Ding, G., Lin, W.: Personality-assisted multi-task learning for generic and personalized image aesthetics assessment. IEEE Trans. Image Process. 29, 3898–3910 (2020). https://doi.org/10.1109/TIP.2020.2968285

    Article  MATH  Google Scholar 

  32. Chen, Q., Zhang, W., Zhou, N., Lei, P., Xu, Y., Zheng, Y., Fan, J.: Adaptive fractional dilated convolution network for image aesthetics assessment. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 14102–14111 (2020). https://doi.org/10.1109/CVPR42600.2020.01412

  33. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2921–2929 (2016). https://doi.org/10.1109/CVPR.2016.319

  34. Ma, S., Liu, J., Chen, C.W.: A-lamp: adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 722–731 (2017). https://doi.org/10.1109/CVPR.2017.84

  35. Fu, X., Yan, J., Fan, C.: Image aesthetics assessment using composite features from off-the-shelf deep models. In: 2018 25th IEEE International Conference on Image Processing (ICIP). pp. 3528–3532 (2018). https://doi.org/10.1109/ICIP.2018.8451133

  36. Hosu, V., Goldlucke, B., Saupe, D.: Effective aesthetics prediction with multi-level spatially pooled features. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9367–9375 (2019). https://doi.org/10.1109/CVPR.2019.00960

  37. Ko, K., Lee, J.-T., Kim, C.-S.: PAC-Net: pairwise aesthetic comparison network for image aesthetic assessment. In: 2018 25th IEEE International Conference on Image Processing (ICIP). pp. 2491–2495 (2018). https://doi.org/10.1109/ICIP.2018.8451621

  38. Lee, J.-T., Kim, C.-S.: Image aesthetic assessment based on pairwise comparison—a unified approach to score regression, binary classification, and personalization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1191–1200 (2019). https://doi.org/10.1109/ICCV.2019.00128

  39. Zeng, H., Cao, Z., Zhang, L., Bovik, A.C.: A unified probabilistic formulation of image aesthetic assessment. IEEE Trans. Image Process. 29, 1548–1561 (2020). https://doi.org/10.1109/TIP.2019.2941778

    Article  MathSciNet  MATH  Google Scholar 

  40. Murray, N., Gordo, A.: A deep architecture for unified aesthetic prediction. arXiv (2017)

  41. Sheng, K., Dong, W., Ma, C., Mei, X., Huang, F., Hu, B.-G.: Attention-based multi-patch aggregation for image aesthetic assessment. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 879–886 (2018). https://doi.org/10.1145/3240508.3240554

Download references

Author information

Authors and Affiliations

Authors

Contributions

Yongzhen Ke: Conceptualization, Methodology, Supervision, Project administration. Yin Wang: Methodology, Software, Writing - Original Draft, Writing - Review & Editing Kai Wang: Methodology, Software, Writing - Original Draft. Fan Qin: Validation, Writing - Review & Editing Jing Guo: Writing - Review & Editing, Formal analysis, Visualization. Shuai Yang: Resources, Validation, Data Curation.

Corresponding author

Correspondence to Kai Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ke, Y., Wang, Y., Wang, K. et al. Image aesthetics assessment using composite features from transformer and CNN. Multimedia Systems 29, 2483–2494 (2023). https://doi.org/10.1007/s00530-023-01141-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01141-7

Keywords

Navigation