Skip to main content

Imageability-Based Multi-modal Analysis of Urban Environments for Architects and Artists

  • Conference paper
  • First Online:
Image Analysis and Processing. ICIAP 2022 Workshops (ICIAP 2022)


According to urban planner Kevin Lynch, imageability is the ability of a physical object to evoke a strong image in any viewer, making it memorable. The concept of imageability is important for architects and urban designers, so that their creations meet the needs of the citizens and improve the aesthetics of the place. Recently, computer vision and textual analysis techniques have been investigated for calculating the imageability of a place. In this paper, we propose a novel multi-modal system that utilises both visual and textual analysis methods to estimate the imageability score of a place. In addition, an image sentiment analysis deep learning model had been developed to provide supplementary information about the sentiment that is evoked to citizens by urban locations. Finally, a text generation algorithm is used to provide an explanation of the information extracted by the data analysis in a form of text to facilitate the works of architects and urban designers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

  2. 2.

  3. 3.

  4. 4.

    In this work, we experimented with free-text comments that were apriori about spaces, as stated in Sect. 4. Therefore, we left other challenges such as the separation of texts about spaces and happenings in spaces for future research.

  5. 5.

    The concept “rotonda” is taken from user describing nearby roundabout which is not captured on the image.


  1. Paivio, A., Yuille, J.C., Madigan, S.A.: Concreteness, imagery, and meaningfulness values for 925 nouns. J. Exp. Psychol. 76, 1–25 (1968)

    Article  Google Scholar 

  2. Lynch, K.: The Image of the City, vol. 11. MIT press, Cambridge (1960)

    Google Scholar 

  3. Ortis, A., Farinella, G., Battiato, S.: An overview on image sentiment analysis: methods. datasets and current challenges. ICETE, 290–300 (2019).

  4. You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, pp. 381–388. AAAI Press (2015)

    Google Scholar 

  5. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  7. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  8. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)

    Google Scholar 

  9. Zhou, B., Lapedriza, À., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017).

    Article  Google Scholar 

  10. Rofes, A., et al.: Imageability ratings across languages. Behav. Res. Methods 50(3), 1187–1197 (2017).

    Article  Google Scholar 

  11. Shvets, A., Wanner, L.: Concept extraction using pointer–generator networks and distant supervision for data augmentation. In: Keet, C.M., Dumontier, M. (eds.) EKAW 2020. LNCS (LNAI), vol. 12387, pp. 120–135. Springer, Cham (2020).

    Chapter  Google Scholar 

  12. Ljubešić, N., Fišer, D., Peti-Stantić, A.: Predicting concreteness and imageability of words within and across languages via word embeddings. arXiv preprint arXiv:1807.02903 (2018)

  13. Scott, G.G., Keitel, A., Becirspahic, M., Yao, B., Sereno, S.C.: The glasgow norms: ratings of 5,500 words on nine scales. Behav. Res. Methods 51(3), 1258–1270 (2018).

    Article  Google Scholar 

  14. Umemura, K., et al.: Tell as you imagine: sentence imageability-aware image captioning. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 62–73. Springer, Cham (2021).

    Chapter  Google Scholar 

  15. Mille, S., Carlini, R., Burga, A., Wanner, L.: FORGe at SemEval-2017 Task 9: deep sentence generation based on a sequence of graph transducers. In: Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, pp. 920–923 (2017)

    Google Scholar 

  16. Meenar, M., Afzalan, N., Hajrasouliha, A.: Analyzing Lynch’s city imageability in the digital age. J. Plan. Educ. Res., 0739456X19844573 (2019)

    Google Scholar 

  17. McCunn, L.J., Gifford, R.: Spatial navigation and place imageability in sense of place. Cities 74, 208–218 (2018)

    Article  Google Scholar 

  18. LNCS Homepage. Accessed 21 Nov 2016

  19. Quercia, D., O’Hare, N.K., Cramer, H.: Aesthetic capital: what makes London look beautiful, quiet, and happy?. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 945–955 (2014)

    Google Scholar 

  20. Porzi, L., Rota Buló, S., Lepri, B., Ricci, E.: Predicting and understanding urban perception with convolutional neural networks. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 139–148 (2015)

    Google Scholar 

  21. Dubey, A., Naik, N., Parikh, D., Raskar, R., Hidalgo, C.A.: Deep learning the city: quantifying urban perception at a global scale. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 196–212. Springer, Cham (2016).

    Chapter  Google Scholar 

  22. Qiu, W., Li, W., Liu, X., Huang, X.: Subjective street scene perceptions for Shanghai with large-scale application of computer vision and machine learning (No. 6166). EasyChair (2021)

    Google Scholar 

  23. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)

    Google Scholar 

  24. Zhou, B., et al.: Semantic understanding of scenes through the ade20k dataset. Int. J. Comput. Vision 127(3), 302–321 (2019)

    Article  Google Scholar 

  25. Biljecki, F., Ito, K.: Street view imagery in urban analytics and GIS: a review. Landscape Urban Plan. 215, 104217 (2021)

    Article  Google Scholar 

  26. Isola, P., Xiao, J., Parikh, D., Torralba, A., Oliva, A.: What makes a photograph memorable? IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1469–1482 (2013)

    Article  Google Scholar 

  27. Hasler, D., Suesstrunk, S.E.: Measuring colorfulness in natural images. In: Human Vision and Electronic Imaging VIII, vol. 5007, pp. 87–95. International Society for Optics and Photonics (2003)

    Google Scholar 

  28. Mel’čuk, I.: Dependency Syntax. State University of New York Press, Albany (1988)

    Google Scholar 

  29. Huang, J., Obracht-Prondzynska, H., Kamrowska-Zaluska, D., Sun, Y., Li, L.: The image of the city on social media: a comparative study using “Big Data’’ and “Small Data’’ methods in the Tri-City Region in Poland. Landscape Urban Plan. 206, 103977 (2021)

    Article  Google Scholar 

  30. Kastner, M.A., et al.: Estimating the imageability of words by mining visual characteristics from crawled image data. Multimedia Tools Appl. 79(3), 18167–18199 (2020).

    Article  Google Scholar 

  31. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995).

    Article  Google Scholar 

  32. Manzo, L.C., Perkins, D.D.: Finding common ground: the importance of place attachment to community participation and planning. J. Plan. Lit. 20, 335–350 (2006)

    Article  Google Scholar 

Download references


This work was supported by the EC-funded research and innovation programme H2020 Mindspaces: “Art-driven adaptive outdoors and indoors design” under the grant agreement No.825079.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Theodora Pistola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pistola, T. et al. (2022). Imageability-Based Multi-modal Analysis of Urban Environments for Architects and Artists. In: Mazzeo, P.L., Frontoni, E., Sclaroff, S., Distante, C. (eds) Image Analysis and Processing. ICIAP 2022 Workshops. ICIAP 2022. Lecture Notes in Computer Science, vol 13373. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13320-6

  • Online ISBN: 978-3-031-13321-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics