Deep Learning the City: Quantifying Urban Perception at a Global Scale

  • Abhimanyu Dubey
  • Nikhil NaikEmail author
  • Devi Parikh
  • Ramesh Raskar
  • César A. Hidalgo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905)


Computer vision methods that quantify the perception of urban environment are increasingly being used to study the relationship between a city’s physical appearance and the behavior and health of its residents. Yet, the throughput of current methods is too limited to quantify the perception of cities across the world. To tackle this challenge, we introduce a new crowdsourced dataset containing 110,988 images from 56 cities, and 1,170,000 pairwise comparisons provided by 81,630 online volunteers along six perceptual attributes: safe, lively, boring, wealthy, depressing, and beautiful. Using this data, we train a Siamese-like convolutional neural architecture, which learns from a joint classification and ranking loss, to predict human judgments of pairwise image comparisons. Our results show that crowdsourcing combined with neural networks can produce urban perception data at the global scale.


Perception Attributes Street view Crowdsourcing 

Supplementary material

419956_1_En_12_MOESM1_ESM.pdf (19.2 mb)
Supplementary material 1 (pdf 19651 KB)


  1. 1.
    Wilson, J.Q., Kelling, G.L.: Broken windows. Atlantic Monthly 249(3), 29–38 (1982)Google Scholar
  2. 2.
    Keizer, K., Lindenberg, S., Steg, L.: The spreading of disorder. Science 322(5908), 1681–1685 (2008)CrossRefGoogle Scholar
  3. 3.
    Milam, A., Furr-Holden, C., Leaf, P.: Perceived school and neighborhood safety, neighborhood violence and academic achievement in urban school children. Urban Rev. 42(5), 458–467 (2010)CrossRefGoogle Scholar
  4. 4.
    Cohen, D.A., Mason, K., Bedimo, A., Scribner, R., Basolo, V., Farley, T.A.: Neighborhood physical conditions and health. Am. J. Public Health 93(3), 467–471 (2003)CrossRefGoogle Scholar
  5. 5.
    Piro, F.N., Nœss, Ø., Claussen, B.: Physical activity among elderly people in a city population: the influence of neighbourhood level violence and self perceived safety. J. Epidemiol. Commun. Health 60(7), 626–632 (2006)CrossRefGoogle Scholar
  6. 6.
    Sampson, R.J.: Great American City: Chicago and the enduring neighborhood effect. University of Chicago Press, Chicago (2012)CrossRefGoogle Scholar
  7. 7.
    Miller, D.K.: Using google street view to audit the built environment: inter-rater reliability results. Ann. Behav. Med. 45(1), 108–112 (2013)Google Scholar
  8. 8.
    Hwang, J., Sampson, R.J.: Divergent pathways of gentrification racial inequality and the social order of renewal in chicago neighborhoods. Am. Sociol. Rev. 79(4), 726–751 (2014)CrossRefGoogle Scholar
  9. 9.
    Salesses, P., Schechtner, K., Hidalgo, C.A.: The collaborative image of the city: mapping the inequality of urban perception. PloS One 8(7), e68–400 (2013)CrossRefGoogle Scholar
  10. 10.
    Quercia, D., O’Hare, N.K., Cramer, H.: Aesthetic capital: what makes London look beautiful, quiet, and happy? In: ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 945–955 (2014)Google Scholar
  11. 11.
    Naik, N., Philipoom, J., Raskar, R., Hidalgo, C.: Streetscore-Predicting the perceived safety of one million streetscapes. In: IEEE CVPR Workshops, pp. 793–799 (2014)Google Scholar
  12. 12.
    Ordonez, V., Berg, T.L.: Learning high-level judgments of urban perception. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 494–510. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10599-4_32 Google Scholar
  13. 13.
    Porzi, L., Rota Bulò, S., Lepri, B., Ricci, E.: Predicting and understanding urban perception with convolutional neural networks. In: ACM Conference on Multimedia, pp. 139–148 (2015)Google Scholar
  14. 14.
    Naik, N., Raskar, R., Hidalgo, C.A.: Cities are physical too: using computer vision to measure the quality and impact of urban appearance. Am. Econ. Rev. 106(5), 128–132 (2016)CrossRefGoogle Scholar
  15. 15.
    Been, V., Ellen, I.G., Gedal, M., Glaeser, E., McCabe, B.J.: Preserving history or restricting development? the heterogeneous effects of historic districts on local housing markets in new york city. J. Urban Econ. 92, 16–30 (2015)CrossRefGoogle Scholar
  16. 16.
    Naik, N., Kominers, S.D., Raskar, R., Glaeser, E.L., Hidalgo, C.A.: Do people shape cities, or do cities shape people? the co-evolution of physical, social, and economic change in five major U.S. cities. Working Paper 21620, National Bureau of Economic Research (2015)Google Scholar
  17. 17.
    Harvey, C., Aultman-Hall, L., Hurley, S.E., Troy, A.: Effects of skeletal streetscape design on perceived safety. Landscape Urban Plann. 142, 18–28 (2015)CrossRefGoogle Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105(2012)Google Scholar
  19. 19.
    Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)Google Scholar
  20. 20.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
  21. 21.
    Joshi, D., Datta, R., Fedorovskaya, E., Luong, Q.T., Wang, J.Z., Li, J., Luo, J.: Aesthetics and emotions in images. IEEE Sig. Process. Mag. 28(5), 94–115 (2011)CrossRefGoogle Scholar
  22. 22.
    Isola, P., Xiao, J., Torralba, A., Oliva, A.: What makes an image memorable? In: IEEE CVPR, pp. 145–152 (2011)Google Scholar
  23. 23.
    Dhar, S., Ordonez, V.: Berg, T.L.: High level describable attributes for predicting aesthetics and interestingness. In: IEEE CVPR, pp. 1657–1664 (2011)Google Scholar
  24. 24.
    Deza, A., Parikh, D.: Understanding image virality. In: IEEE CVPR, pp. 1818–1826 (2015)Google Scholar
  25. 25.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition (2013). arXiv preprint arXiv:1310.1531
  26. 26.
    Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes paris look like paris? ACM Trans. Graph. 31(4), 101 (2012)CrossRefGoogle Scholar
  27. 27.
    Lee, S., Maisonneuve, N., Crandall, D., Efros, A., Sivic, J.: Linking past to present: discovering style in two centuries of architecture. In: IEEE International Conference on Computational Photography (2015)Google Scholar
  28. 28.
    Arietta, S.M., Efros, A.A., Ramamoorthi, R., Agrawala, M.: City forensics: using visual elements to predict non-visual city attributes. IEEE Trans. Visual. Comput. Graph. 20(12), 2624–2633 (2014)CrossRefGoogle Scholar
  29. 29.
    Glaeser, E.L., Kominers, S.D., Luca, M., Naik, N.: Big data and big cities: the promises and limitations of improved measures of urban life. Working Paper 21778, National Bureau of Economic Research (2015)Google Scholar
  30. 30.
    Zhou, B., Liu, L., Oliva, A., Torralba, A.: Recognizing city identity via attribute analysis of geo-tagged images. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 519–534. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10578-9_34 Google Scholar
  31. 31.
    Khosla, A., An, B., Lim, J.J., Torralba, A.: Looking beyond the visible scene. In: IEEE CVPR, pp. 3710–3717 (2014)Google Scholar
  32. 32.
    Kuipers, M.A., van Poppel, M.N., van den Brink, W., Wingen, M., Kunst, A.E.: The association between neighborhood disorder, social cohesion and hazardous alcohol use: a national multilevel study. Drug Alcohol Depend. 126(1), 27–34 (2012)CrossRefGoogle Scholar
  33. 33.
    Dulin-Keita, A., Thind, H.K., Affuso, O., Baskin, M.L.: The associations of perceived neighborhood disorder and physical activity with obesity among african american adolescents. BMC Pub. Health 13(1), 440 (2013)CrossRefGoogle Scholar
  34. 34.
    Kelling, G.L., Coles, C.M.: Fixing Broken Windows: Restoring Order and Reducing Crime in Our Communities. Simon and Schuster, New York (1997)Google Scholar
  35. 35.
    Sampson, R.J., Raudenbush, S.W.: Disorder in urban neighborhoods: Does it lead to crime. National Institute of Justice (2001)Google Scholar
  36. 36.
    Harcourt, B.E.: Reflecting on the subject: a critique of the social influence conception of deterrence, the broken windows theory, and order-maintenance policing New York style. Mich. Law Rev. 97(2), 291–389 (1998)CrossRefGoogle Scholar
  37. 37.
    Parikh, D., Grauman, K.: Relative attributes. In: IEEE ICCV, pp. 503–510 (2011)Google Scholar
  38. 38.
    Parkash, A., Parikh, D.: Attributes for classifier feedback. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 354–368. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33712-3_26 Google Scholar
  39. 39.
    Kovashka, A., Parikh, D., Grauman, K.: Whittlesearch: Image search with relative attribute feedback. In: IEEE CVPR, pp. 2973–2980 (2012)Google Scholar
  40. 40.
    Kiapour, M.H., Yamaguchi, K., Berg, A.C., Berg, T.L.: Hipster wars: discovering elements of fashion styles. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 472–488. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_31 Google Scholar
  41. 41.
    Zhu, J.Y., Agarwala, A., Efros, A.A., Shechtman, E., Wang, J.: Mirror mirror: crowdsourcing better portraits. ACM Trans. Graph. 33(6), 234 (2014)CrossRefGoogle Scholar
  42. 42.
    Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: IEEE CVPR, pp. 1386–1393 (2014)Google Scholar
  43. 43.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: IEEE CVPR, pp. 4353–4361 (2015)Google Scholar
  44. 44.
    Persson, P.O., Strang, G.: A simple mesh generator in MATLAB. SIAM Rev. 46(2), 329–345 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Stewart, N., Brown, G.D., Chater, N.: Absolute identification by relative judgment. Psychol. Rev. 112(4), 881 (2005)CrossRefGoogle Scholar
  46. 46.
    Bijmolt, T.H., Wedel, M.: The effects of alternative methods of collecting similarity data for multidimensional scaling. Int. J. Res. Mark. 12(4), 363–371 (1995)CrossRefGoogle Scholar
  47. 47.
    Jou, B., Bhattacharya, S., Chang, S.F.: Predicting viewer perceived emotions in animated GIFs. In: ACM International Conference on Multimedia, pp. 213–216 (2014)Google Scholar
  48. 48.
    Sartori, A., Yanulevskaya, V., Salah, A.A., Uijlings, J., Bruni, E., Sebe, N.: Affective analysis of professional and amateur abstract paintings using statistical analysis and art theory. ACM Trans. Interact. Intell. Syst. 5(2), 8 (2015)CrossRefGoogle Scholar
  49. 49.
    Herbrich, R., Minka, T., Graepel, T.: TrueSkill: a bayesian skill rating system. In: Advances in Neural Information Processing Systems, pp. 569–576 (2006)Google Scholar
  50. 50.
    Joachims, T.: Optimizing search engines using clickthrough data. In: ACM International Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2002)Google Scholar
  51. 51.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: IEEE CVPR, vol. 1, pp. 539–546 (2005)Google Scholar
  52. 52.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE CVPR, pp. 1725–1732 (2014)Google Scholar
  53. 53.
    Chapelle, O., Keerthi, S.S.: Efficient algorithms for ranking with SVMs. Inf. Retrieval 13(3), 201–215 (2010)CrossRefGoogle Scholar
  54. 54.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  55. 55.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  56. 56.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  57. 57.
    Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vis. 43(1), 7–27 (2001)CrossRefzbMATHGoogle Scholar
  58. 58.
    Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: IEEE CVPR, pp. 3485–3492 (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Abhimanyu Dubey
    • 1
  • Nikhil Naik
    • 3
    Email author
  • Devi Parikh
    • 2
  • Ramesh Raskar
    • 3
  • César A. Hidalgo
    • 3
  1. 1.Indian Institute of TechnologyDelhiIndia
  2. 2.Virginia TechBlacksburgUSA
  3. 3.MIT Media LabCambridgeUSA

Personalised recommendations