Skip to main content

iPoet: interactive painting poetry creation with visual multimodal analysis


Chinese painting poetry is an extraordinary aesthetic phenomenon in world art history. It is not only part of the paintings but also helps us to better understand the spiritual conception that the artists express. In this paper, we present an interactive visual system to enable ordinary users to compose customized painting poetry for ancient Chinese paintings, which contain three properties: (1) We employ object detection and image captioning to describe the scenery depicted in the painting. (2) We extend the modern color theory to analyze the underlying emotions of each painting. (3) We propose an interactive poetry generation method that takes the content description and the emotional expression to add the diversity of the poetry creation. Several visual components are carefully designed to visualize and contextualize the features in the painting. They effectively guide users to steer the creation of personalized painting poems. We conduct efficient case studies and user interviews to demonstrate the effectiveness of our system.

Graphic abstract

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

  2. 2.


  1. Anderson P, Fernando B, Johnson M, Gould S (2016) Spice: Semantic propositional image caption evaluation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer, pp 382–398

  2. Chen H, Yi X, Sun M, Li W, Yang C, Guo Z (2019) Sentiment-controllable chinese poetry generation. pp 4925–4931

  3. Cheng W-F, Wu C-C, Song R, Fu J, Xie X, Nie J-Y (2018) Image inspired poetry generation in xiaoice. arXiv preprintarXiv:1808.03090

  4. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprintarXiv:1406.1078

  5. Giovannangeli L, Bourqui R, Giot R, Auber D (2020) Toward automatic comparison of visualization techniques: application to graph visualization. Vis Inform 4(2):86–98

    Article  Google Scholar 

  6. Han D, Pan J, Zhao X, Chen W (2021) Netv. js: a web-based library for high-efficiency visualization of large-scale graphs and networks. Vis Inform 5(1):61–66

    Article  Google Scholar 

  7. Hu H (2018) Visualization design and research of the style and sects change of song ci. Harbin Institute Of Technology (Master’s thesis)

  8. Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Toward controlled generation of text. In: Proceedings of ICML, pp 1587–1596

  9. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of CVPR, pp 3296–3297

  10. Johnson J, Krishna R, Stark M, Li L-J, Shamma DA, Bernstein MS, Fei-Fei L (2015) Image retrieval using scene graphs. In: Proceedings of CVPR, pp 3668–3678

  11. Kaneko A, Komatsu A, Itoh T, Wang FY (2020) Painting image browser applying an associate-rule-aware multidimensional data visualization technique. Vis Comput Ind Biomed Art 3(1):1–13

    Article  Google Scholar 

  12. Kang D, Shim H, Yoon K (2018) A method for extracting emotion using colors comprise the painting image. Multimed Tools Appl 77(4):4985–5002

    Article  Google Scholar 

  13. Karpathy A, Li F (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of CVPR, pp 3128–3137

  14. Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg AC, Berg TL (2013) Babytalk: understanding and generating simple image descriptions. TPAMI 35(12):2891–2903

    Article  Google Scholar 

  15. Leite RA, Arleo A, Sorger J, Gschwandtner T, Miksch S (2020) Hermes: guidance-enriched visual analytics for economic network exploration. Vis Inform 4(4):11–22

    Article  Google Scholar 

  16. Li Y, Fujiwara T, Choi YK, Kim KK, Ma K-L (2020) A visual analytics system for multi-model comparison on clinical data predictions. Vis Inform 4(2):122–131

    Article  Google Scholar 

  17. Liu L, Wan X, Guo Z (2018) Images2poem: Generating Chinese poetry from image streams. In: Proceedings of ACMMM, pp 1967–1975

  18. Lu C, Krishna R, Bernstein M, Fei-Fei L (2016) Visual relationship detection with language priors, vol 9905, pp 852–869

  19. Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: Proceedings of CVPR, pp 3242–3250

  20. McCurdy N, Lein J, Coles K, Meyer M (2015) Poemage: visualizing the sonic topology of a poem. TVCG 22(1):439–448

    Google Scholar 

  21. Meneses L, Furuta R (2015) Visualizing poetry: Tools for critical analysis. paj: J Init Digit Hum Med Cult 3:1

    Google Scholar 

  22. Newell A, Deng J (2017) Pixels to graphs by associative embedding, vol NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp 2168–2177

  23. Pinaud B, Vallet J, Melançon G (2020) On visualization techniques comparison for large social networks overview: a user experiment. Vis Inform 4(4):23–34

    Article  Google Scholar 

  24. Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. TPAMI 39(6):1137–1149

    Article  Google Scholar 

  25. Schuster S, Krishna R, Chang A, Fei-Fei L, Manning C (2015) Generating semantically precise scene graphs from textual descriptions for improved image retrieval. pp 70–80

  26. Shi L, Liao Q, Tong H, Hu Y, Wang C, Lin C, Qian W (2020) Oniongraph: Hierarchical topology+ attribute multivariate network visualization. Vis Inform 4(1):43–57

    Article  Google Scholar 

  27. Shu X, Wu J, Wu X, Liang H, Cui W, Wu Y, Qu H (2021) Dancingwords: exploring animated word clouds to tell stories. J Vis 24(1):85–100

    Article  Google Scholar 

  28. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint:1409.1556

  29. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp 2818–2826

  30. Takahashi F, Kawabata Y (2018) The association between colors and emotions for emotional words and facial expressions. Color Res Appl 43(2):247–257

    Article  Google Scholar 

  31. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of CVPR, pp 3156–3164

  32. Wang X, Zeng H, Wang Y, Wu A, Sun Z, Ma X, Qu H (2020) Voicecoach: Interactive evidence-based training for voice modulation skills in public speaking. In: Proceedings of CHI, pp 1–12. ACM

  33. Wang Y, Haleem H, Shi C, Wu Y, Zhao X, Fu S, Qu H (2018) Towards easy comparison of local businesses using online reviews. Comput Gr Forum 37(3):63–74

    Article  Google Scholar 

  34. Wang Z, He W, Wu H, Wu H, Li W, Wang H, Chen E (2016) Chinese poetry generation with planning based neural network. arXiv preprint arXiv:1610.09889

  35. Wu L, Xu M, Qian S, Cui J (2020) Image to modern chinese poetry creation via a constrained topic-aware model. TOMM 16(2):1–21

    Article  Google Scholar 

  36. Xu D, Zhu Y, Choy C, Fei-Fei L (2017) Scene graph generation by iterative message passing. pp 3097–3106

  37. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of ICML, pp 2048–2057

  38. Xu L, Jiang L, Qin C, Wang Z, Du D (2018) How images inspire poems: Generating classical chinese poetry from images with memory networks. In: Proceedings of AAAI, vol 32

  39. Yan R (2016) i, poet: Automatic poetry composition through recurrent neural networks with iterative polishing schema. pp 2238–2244

  40. Yang J, Fan J, Hubball D, Gao Y, Luo H, Ribarsky W, Ward M (2006) Semantic image browser: bridging information visualization with automated intelligent image analysis, pp 191–198

  41. Yang X, Tang K, Zhang H, Cai J (2019) Auto-encoding scene graphs for image captioning. In: Proceedings of CVPR, pp 10677–10686

  42. Yi X, Li R, Yang C, Li W, Sun M (2020) Mixpoet: diverse poetry generation via learning controllable mixed latent space. Proc AAAI 34:9450–9457

    Article  Google Scholar 

  43. Yi X, Sun M, Li R, Yang Z (2018) Chinese poetry generation with a working memory model. arXiv preprint arXiv:1809.04306

  44. Zhang W, Siwei T, Liu K, Lei S, Chen S, Chen W (2019) A new perspective on the study of literature (songci): text correlation and spatio-temporal visual analytics. J Comput-Aided Des Comput Gr 31(10):1687–1697

    Google Scholar 

  45. Zhang X, Lapata M (2014) Chinese poetry generation with recurrent neural networks. In: Proceedings of EMNLP, pp 670–680

  46. Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Gr 27(2):1698–1708

    Article  Google Scholar 

  47. Zhao Y, Luo X, Lin X, Wang H, Kui X, Zhou F, Wang J, Chen Y, Chen W (2019) Visual analytics for electromagnetic situation awareness in radio monitoring and management. IEEE Trans Vis Comput Gr 26(1):590–600

    Article  Google Scholar 

  48. Zhou F, Lin X, Liu C, Zhao Y, Xu P, Ren L, Xue T, Ren L (2019) A survey of visualization for smart manufacturing. J Vis 22(2):419–435

    Article  Google Scholar 

  49. Zhou H, Huang M, Zhang T, Zhu X, Liu B (2018) Emotional chatting machine: emotional conversation generation with internal and external memory. In: Proceedings of AAAI, vol 32

Download references


This work is supported by National Natural Science Foundation of China (61972122, 61772456).

Author information



Corresponding author

Correspondence to Jiazhou Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Feng, Y., Chen, J., Huang, K. et al. iPoet: interactive painting poetry creation with visual multimodal analysis. J Vis (2021).

Download citation


  • Poetry creation
  • Chinese painting
  • Visual analysis
  • Multimodal analysis