Deep Reinforcement Learning for Automatic Thumbnail Generation

  • Zhuopeng Li
  • Xiaoyan Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)


An automatic thumbnail generation method based on deep reinforcement learning (called RL-AT) is proposed in this paper. Differing from previous saliency-based and deep learning-based methods which predict the location and size of a rectangle region, our method models the thumbnail generation as predicting a rectangle region by cutting along four edges of the rectangle. We project the thumbnail cutting operations as a four step Markov decision-making process in the framework of deep Reinforcement learning. The best crop location in each cutting step is learned by using a deep Q-network. The deep Q-network gets observations from the recent image and selects an action from the action space. Then the deep Q-network receives feedback based on current selected action as reward. The action space and reward function are specifically designed for the thumbnail generation problem. A data set with more than 70,000 thumbnail annotations is used to train our RL-AT model. Our RL-AT model can efficiently generate thumbnails with low computational complexity, and 0.09 s is needed to generate a thumbnail image. Experiments have shown that our RL-AT model outperforms related methods in the thumbnail generation.


Thumbnail generation Reinforcement learning Q-network 


  1. 1.
    Ardizzone, E., Bruno, A., Mazzola, G.: Saliency based image cropping. In: Petrosino, A. (ed.) ICIAP 2013. LNCS, vol. 8156, pp. 773–782. Springer, Heidelberg (2013). Scholar
  2. 2.
    Bellver, M., Giró-i Nieto, X., Marqués, F., Torres, J.: Hierarchical object detection with deep reinforcement learning. arXiv preprint arXiv:1611.03718 (2016)
  3. 3.
    Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2488–2496 (2015)Google Scholar
  4. 4.
    Chen, Y.L., Klopp, J., Sun, M., Chien, S.Y., Ma, K.L.: Learning to compose with professional photographs on the web. In: Proceedings of the 2017 ACM on Multimedia Conference, pp. 37–45. ACM (2017)Google Scholar
  5. 5.
    Ciocca, G., Cusano, C., Gasparini, F., Schettini, R.: Self-adaptive image cropping for small displays. IEEE Trans. Consum. Electron. 53(4), 1622–1627 (2007)CrossRefGoogle Scholar
  6. 6.
    Esmaeili, S.A., Singh, B., Davis, L.S.: Fast-at: fast automatic thumbnail generation using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4178–4186 (2017)Google Scholar
  7. 7.
    Fang, C., Lin, Z., Mech, R., Shen, X.: Automatic image croppingusing visual composition, boundary simplicity and content preservation models. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1105–1108. ACM (2014)Google Scholar
  8. 8.
    Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1915–1926 (2012)CrossRefGoogle Scholar
  9. 9.
    Huang, J., Chen, H., Wang, B., Lin, S.: Automatic thumbnail generation based on visual representativeness and foreground recognizability. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 253–261 (2015)Google Scholar
  10. 10.
    Jie, Z., Liang, X., Feng, J., Jin, X., Lu, W., Yan, S.: Tree-structured reinforcement learning for sequential object localization. In: Advances in Neural Information Processing Systems, pp. 127–135 (2016)Google Scholar
  11. 11.
    Li, D., Wu, H., Zhang, J., Huang, K.: A2-RL: aesthetics aware reinforcement learning for image cropping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8193–8201 (2018)Google Scholar
  12. 12.
    Liang, X., Lee, L., Xing, E.P.: Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4408–4417 (2017)Google Scholar
  13. 13.
    Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.J.: Deep reinforcement learning-based image captioning with embedding reward. arXiv preprint arXiv:1704.03899 (2017)
  14. 14.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  15. 15.
    Sun, J., Ling, H.: Scale and object aware image thumbnailing. Int. J. Comput. Vis. 104(2), 135–153 (2013)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Tan, W., Yan, B., Li, K., Tian, Q.: Image retargeting for preserving robust local feature: application to mobile visual search. IEEE Trans. Multimedia 18(1), 128–137 (2016)CrossRefGoogle Scholar
  17. 17.
    Vinyals, O., et al.: StarCraft II: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782 (2017)
  18. 18.
    Zhang, L., Wang, M., Nie, L., Hong, L., Rui, Y., Tian, Q.: Retargeting semantically-rich photos. IEEE Trans. Multimedia 17(9), 1538–1549 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.College of Computer Science and Software EngineeringShenzhen UniversityShenzhenChina

Personalised recommendations