Integrity-Preserving Image Aesthetic Assessment

  • Xin SunEmail author
  • Jun Zhou
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 313)


Image aesthetic assessment is a challenging problem in the field of computer vision. Recently, the input size of images is often limited by the network of aesthetic problems. The methods of cropping, wrapping and padding unify images to the same size, which will destroy the aesthetic quality of the images and affect their aesthetic rating labels. In this paper, we present an end-to-end deep Multi-Task Spatial Pyramid Pooling Fully Convolutional Neural NasNet (MTP-NasNet) method for image aesthetic assessment that can directly manipulate the original size of the image without destroying its beauty. Our method is developed based on Fully Convolutional Network (FCN) and Spatial Pyramid Pooling (SPP). In addition, existing studies regards aesthetic assessment as a two-category task, a distribution predicting task or a style predicting task, but ignore the correlation between these tasks. To address this issue, we adopt the multi-task learning method that fuses two-category task, style task and score distribution task. Moreover, this paper also explores the reference of information such as variance in the score distribution for image reliability. Our experiment results show that our approach has significant performance on the large-scale aesthetic assessment datasets (AVA [1]), and demonstrate the importance of multi-task learning and size preserving. Our study provides a powerful tool for image aesthetic assessment, which can be applied to photography and image optimization field.


Multi-task learning Image aesthetic assessment Fully convolutional neural networks Spatial pooling layer 



The paper was supported by NSFC under Grant 61471234, 61771303, and Science and Technology Commission of Shanghai Municipality (STCSM) under Grant 18DZ1200102


  1. 1.
    Murray, N., Marchesotti, L., Perronnin, F.: Ava: a large-scale database for aesthetic visual analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2408–2415. IEEE (2012)Google Scholar
  2. 2.
    Marchesotti, L., Perronnin, F., Larlus, D., Csurka, G.: Assessing the aesthetic quality of photographs using generic image descriptors. In: 2011 International Conference on Computer Vision, pp. 1784–1791. IEEE (2011)Google Scholar
  3. 3.
    Larson, E.C., Chandler, D.M.: Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging 19(1), 011006 (2010)CrossRefGoogle Scholar
  4. 4.
    Yin, W., Mei, T., Chen, C.W., Li, S.: Socialized mobile photography: learning to photograph with social context via mobile devices. IEEE Trans. Multimed. 16(1), 184–200 (2013)CrossRefGoogle Scholar
  5. 5.
    Luo, Y., Tang, X.: Photo and video quality evaluation: focusing on the subject. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 386–399. Springer, Heidelberg (2008). Scholar
  6. 6.
    Lu, X., Lin, Z., Shen, X., Mech, R., Wang, J.Z.: Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 990–998 (2015)Google Scholar
  7. 7.
    Cui, C., Fang, H., Deng, X., Nie, X., Dai, H., Yin, Y.: Distribution-oriented aesthetics assessment for image search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1013–1016. ACM (2017)Google Scholar
  8. 8.
    Lu, X., Lin, Z., Jin, H., Yang, J., Wang, J.Z.: Rating image aesthetics using deep learning. IEEE Trans. Multimed. 17(11), 2021–2034 (2015)CrossRefGoogle Scholar
  9. 9.
    Talebi, H., Milanfar, P.: Nima: Neural image assessment. IEEE Trans. Image Process. 27(8), 3998–4011 (2018)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments (2008)Google Scholar
  11. 11.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). Scholar
  12. 12.
    Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Advances in Neural Information Processing Systems, pp. 1988–1996 (2014)Google Scholar
  13. 13.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  14. 14.
    Brandão, T., Queluz, M.P.: No-reference quality assessment of H. 264/AVC encoded video. IEEE Trans. Circuits Syst. Video Technol. 20(11), 1437–1447 (2010)CrossRefGoogle Scholar
  15. 15.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Studying aesthetics in photographic images using a computational approach. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 288–301. Springer, Heidelberg (2006). Scholar
  16. 16.
    Sun, X., Yao, H., Ji, R., Liu, S.: Photo assessment based on computational visual attention model. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 541–544. ACM (2009)Google Scholar
  17. 17.
    Peng, K.C., Chen, T.: Toward correlating and solving abstract tasks using convolutional neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE (2016)Google Scholar
  18. 18.
    Cui, C., Liu, H., Lian, T., Nie, L., Zhu, L., Yin, Y.: Distribution-oriented aesthetics assessment with semantic-aware hybrid network. IEEE Trans. Multimed. 21(5), 1209–1220 (2018)CrossRefGoogle Scholar
  19. 19.
    Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 109–117. ACM (2004)Google Scholar
  20. 20.
    Jebara, T.: Multitask sparsity via maximum entropy discrimination. J. Mach. Learn. Res. 12, 75–110 (2011)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)CrossRefGoogle Scholar
  22. 22.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)CrossRefGoogle Scholar
  24. 24.
    Liu, X., Gao, J., He, X., Deng, L., Duh, K., Wang, Y.Y.: Representation learning using multi-task deep neural networks for semantic classification and information retrieval (2015)Google Scholar
  25. 25.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  26. 26.
    Yang, Y., Hospedales, T.M.: Trace norm regularised deep multi-task learning. arXiv preprint arXiv:1606.04038 (2016)
  27. 27.
    Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7482–7491 (2018)Google Scholar
  28. 28.
    Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. arXiv preprint arXiv:1704.05742 (2017)
  29. 29.
    Geng, X.: Label distribution learning. IEEE Trans. Knowl. Data Eng. 28(7), 1734–1748 (2016)CrossRefGoogle Scholar
  30. 30.
    Geng, X., Hou, P.: Pre-release prediction of crowd opinion on movies by label distribution learning. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)Google Scholar
  31. 31.
    Geng, X., Yin, C., Zhou, Z.H.: Facial age estimation by learning from label distributions. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2401–2412 (2013)CrossRefGoogle Scholar
  32. 32.
    Mai, L., Jin, H., Liu, F.: Composition-preserving deep photo aesthetics assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 497–506 (2016)Google Scholar
  33. 33.
    Wu, O., Hu, W., Gao, J.: Learning to predict the perceived visual quality of photos. In: 2011 International Conference on Computer Vision, pp. 225–232. IEEE (2011)Google Scholar
  34. 34.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2020

Authors and Affiliations

  1. 1.Institute of Image Communication and Network EngineeringShanghai JiaoTong UniversityShanghaiChina

Personalised recommendations