Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces

Vanky, Anthony; Le, Ri

doi:10.1007/s00146-022-01625-6

Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces

Main Paper
Published: 10 January 2023

Volume 38, pages 1193–1207, (2023)
Cite this article

AI & SOCIETY Aims and scope Submit manuscript

547 Accesses
1 Citation
Explore all metrics

Abstract

Increasing computational power and improving deep learning methods have made computer vision technologies pervasively common in urban environments. Their applications in policing, traffic management, and documenting public spaces are increasingly common (Ridgeway 2018, Coifman et al. 1998, Sun et al. 2020). Despite the often-discussed biases in the algorithms' training and unequally borne benefits (Khosla et al. 2012), almost all applications similarly reduce urban experiences to simplistic, reductive, and mechanistic measures. There is a lack of context, depth, and specificity in these practices that enables semantic knowledge or analysis within urban contexts, especially within the context of using and occupying urban space. This paper will critique existing uses of artificial intelligence and computer vision in urban practices to propose a new framework for understanding people, action, and public space. This paper revisits Geertz's (1973) use of thick descriptions in generating interpretive theories of culture and activity and uses this lens to establish a framework to approach evaluating the varied uses of computer vision technologies that weigh meaning. By discussing cases of implemented examples of urban computer vision—from LinkNYC and Numina's urban measurements to the Detroit Police's use of DataWorks Plus's facial recognition technology—it proposes a framework for evaluating the thickness of the algorithm's conclusions against the computational method's complexity required to produce that outcome. Further, we discuss how the framework's positioning may differ (and conflict) between different users of the technology, from engineer to urban planner and policymaker, to citizen. This paper also discusses how the current use and training of deep learning algorithms and how this process limits semantic learning and proposes three potential methodologies toward gaining a more contextually specific, urban-semantic, description of urban space relevant to urbanists. This paper contributes to the critical conversations regarding the proliferation of artificial intelligence by challenging the current applications of these technologies in the urban environment by highlighting their failures within this context while also proposing an evolution of these algorithms that may ultimately make them sensitive and useful within this spatial and cultural milieu.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Looking with Machine Eyes: City Monitoring for Urban Resilience

Innovative Approaches to Assessing Urban Space Quality: A Multi-Source Big Data Perspective on Knowledge Dynamics

Article 28 March 2024

Characterisation of urban environment and activity across space and time using street images and deep learning in Accra

Article Open access 28 November 2022

Data availability

No datasets were generated or analyzed during the current study.

Notes

“Imageable” here is taken to mean a cognitive, memory-based image as was used in the previous reference of Lynch’s work.
These together form the commonly used “four V’s” of big data: velocity, veracity, volume and variety.
While this paper will not comprehensively review the technology and its application depth, other papers have sought to categorize various approaches. See Ibrahim et al. 2020, for instance.

References

ABI Research (2021) Deep learning-based machine vision in smart cities. https://www.abiresearch.com/press/global-installed-base-smart-city-cameras-ai-chipset-reach-over-350-million-2025/
Ackerman D (2017) Google maps street view celebrates its 10th birthday. CNet. https://www.cnet.com/news/google-maps-street-view-celebrates-its-10th-birthday/
ACLU NY (2016) NYCLU: city’s public wi-fi raises privacy concerns.
Al-Faris M, Chiverton J, Ndzi D, Ahmed AI (2020) A review on computer vision-based methods for human action recognition. J Imag. https://doi.org/10.3390/jimaging6060046
Article Google Scholar
Anguelov D, Dulong C, Filip D, Frueh C, Lafon S, Lyon R, Ogale A, Vincent L, Weaver J (2010) Google street view: capturing the world at street level. Computer 43(6):32–38. https://doi.org/10.1109/MC.2010.170
Article Google Scholar
Attribute detection with Body Camera Analytics (2020) IBM intelligent video analytics documentation. https://www.ibm.com/docs/en/iva/2.0.0?topic=video-attribute-detection-body-camera-analytics
Azar M, Cox G, Impett L (2021) Introduction: ways of machine seeing. AI and society. Springer Science and Business Media Deutschland GmbH, Berlin, pp 1–12. https://doi.org/10.1007/s00146-020-01124-6
Chapter Google Scholar
Berlyn DE (1971) Aesthetics and psychobiology. Appleton-Century-Crofts
Brannen J (2005) Mixing methods: the entry of qualitative and quantitative approaches into the research process. Int J Soc Res Methodol Theory Pract 8(3):173–184. https://doi.org/10.1080/13645570500154642
Article Google Scholar
Brill M (1989) An ontology for exploring urban public life today. Places 6(1):24–31. http://escholarship.org/uc/item/4kc602c7
byronv2 (2019) Texting one another [photograph]. Flickr. https://flic.kr/p/23B3Jc4
byronv2 (2020a) Ice Cream Time [Photograph]. Flickr. https://flic.kr/p/2jjDBQv
byronv2 (2020b) Lunch al Fresco [Photograph]. Flickr. https://flic.kr/p/2iEczU1
Chetan V (2019) Man jumping from a rock [photograph]. Pexels. https://www.pexels.com/photo/man-jumping-from-a-rock-2923157/
Chidster M (1989) Public places, private lives: plazas and the broader public. Places 6(1):32–37. http://escholarship.org/uc/item/9gr5n6hd
Collins RL (2011) Content analysis of gender roles in media: where are we now and where should we go? Sex Roles 64(3):290–298. https://doi.org/10.1007/s11199-010-9929-5
Article Google Scholar
Collins J (2020) Police bodycam video shows george Floyd’s distress during fatal arrest. NPR. https://www.npr.org/2020/07/15/891516654/police-bodycam-video-provides-fuller-picture-of-george-floyds-fatal-arrest
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 3213–3223. https://doi.org/10.1109/CVPR.2016.350
Crawford K (2018) Artificial intelligence’s white guy problem. The New York Times. https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html
Czarniawska B (1992) Exploring complex organizations: a cultural perspective: toward an anthropological perspective. SAGE, Singapore
Google Scholar
Dahlberg L (2015) Charles Marville, photographer of Paris/piercing time: Paris after Marville and atget, 1865–2012. Hist Photogr 39(2):194–196. https://doi.org/10.1080/03087298.2015.1035533
Article Google Scholar
Deng J, Dong W, Socher R, Li L-J, Kai L, Li F-F (2009) ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 20(11): 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Desmond R, Danilewicz A (2010) Women are on, but not in, the news: gender roles in local television news. Sex Roles 62(11):822–829. https://doi.org/10.1007/s11199-009-9686-5
Article Google Scholar
Dreyfus, H. L. (1992). What Computers Still Can’t Do: A Critique of Artificial Reason. The MIT Press.
Duarte F, DeSouza P (2020) Data science and cities: a critical approach. Harvard Data Sci Rev. https://doi.org/10.1162/99608f92.b3fc5cc8
Article Google Scholar
Eagle N, Pentland AS (2009) Eigenbehaviors : identifying structure in routine. 1057–1066. https://doi.org/10.1007/s00265-009-0739-0
Garvie C, Moy LM (2019) America under watch. https://www.americaunderwatch.com/
Geertz C (1973) Thick description: toward an interpretive theory of culture. In: Turning points in qualitative research: Tying knots in a handkerchief, pp 143–168
Gehl J (1987) Life between buildings: using public space. Island Press
Google Scholar
Gershenson C (2013) The implications of interactions for science and philosophy. Found Sci 18(4):781–790. https://doi.org/10.1007/s10699-012-9305-8
Article Google Scholar
Gill KS (2020) Prediction paradigm: the human price of instrumentalism. AI Soc 35(3):509–517. https://doi.org/10.1007/s00146-020-01035-6
Article Google Scholar
Girardin F, Calabrese F, Fiore FD, Ratti C, Blat J (2008) Digital footprinting: uncovering tourists with user-generated content. IEEE Pervasive Comput 7(4):36–43. https://doi.org/10.1109/MPRV.2008.71
Article Google Scholar
Goldsmith S, Crawford S (2014) The city as digital platform. In: The responsive city. Jossey-Bass
Greenfield A (2013) Against the smart city. Do Projects
Hand DJ (2020) Dark data: why what you don’t know matters. Princeton University Press
Book Google Scholar
Harwell D (2019) Ring, the doorbell-camera firm, has partnered with 400 police forces, extending surveillance reach. The Washington Post. https://www.washingtonpost.com/technology/2019/08/28/doorbell-camera-firm-ring-has-partnered-with-police-forces-extending-surveillance-reach/
Hernandez J, Hoque M, Drevo W, Picard RW (2012) Mood meter: counting smiles in the wild. Proceedings of the 2012 ACM Conference on Ubiquitous Computing - UbiComp ’12, 301. https://doi.org/10.1145/2370216.2370264
Hill K (2020) Wrongfully accused by an algorithm. The New York Times. https://www.nytimes.com/2020/06/24/technology/facial-recognition-arrest.html
Hinchcliffe T (2010) Aerial photography and the Postwar urban planner in London. Lond J 35(3):277–288. https://doi.org/10.1179/174963210X12814015170232
Article Google Scholar
hjl (2012) Blind date—green park [photograph]. Flickr. https://flic.kr/p/cBGctS
Hollands RG (2008) Will the real smart city please stand up? City 12(3):303–320. https://doi.org/10.1080/13604810802479126
Article Google Scholar
Ibrahim MR, Haworth J, Cheng T (2020) Understanding cities with machine eyes: a review of deep computer vision in urban analytics. Cities 96:102481. https://doi.org/10.1016/j.cities.2019.102481
Article Google Scholar
Idrees H, Zamir AR, Jiang Y-G, Gorban A, Laptev I, Sukthankar R, Shah M (2017) The THUMOS challenge on action recognition for videos “in the wild.” Comput vis Image Underst 155:1–23. https://doi.org/10.1016/j.cviu.2016.10.018
Article Google Scholar
Jacobs J (1970) The economy of cities. Random House
Google Scholar
Jacobs A, Appleyard D (1987) Toward an urban design manifesto. J Am Plann Assoc 53(1):112–120. https://doi.org/10.1080/01944368708976642
Article Google Scholar
Jacobs J (1961) The death and life of great American cities. Vintage Books. https://books.google.com/books?hl=en&lr=&id=P_bPTgOoBYkC&oi=fnd&pg=PA7&ots=JW1O38Fpf5&sig=X-9dkYK56vjYblU9O1I-kh0yYFQ#v=onepage&q&f=false
Jemielniak D (2020) Thick big data. Oxford University Press, Oxford. https://doi.org/10.1093/oso/9780198839705.001.0001
Book Google Scholar
Jiang S, Fiore GA, Yang Y, Ferreira J, Frazzoli E, González MC (2013) A review of urban computing for mobile phone traces: current methods, challenges and opportunities. UrbComp
Kirchner L, Mattu S, Larson J, Angwin J (2016) Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Kofman A (2018) Are New York’s free LInkNYC internet kiosks tracking your movements? The Intercept. https://theintercept.com/2018/09/08/linknyc-free-wifi-kiosks/
Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, Pont-Tuset J, Veit A, Bel K (2017) OpenImages: A public dataset for large-scale multi-label and multi-class image classification. https://storage.googleapis.com/openimages/web/index.html
Kubo M, Pasnik M, Grimley C (2010) Tough love: in defense of brutalism. Architect Magazine. https://www.architectmagazine.com/design/tough-love-in-defense-of-brutalism_o
Kwet M (2020) The rise of the video surveillance industrial complex. The Intercept. https://theintercept.com/2020/01/27/surveillance-cctv-smart-camera-networks/
Le Corbusier (1935) Aircraft. The Studio.
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Lee TB (2020) Detroit police chief cops to 96-percent facial recognition error rate. Ars Technica. https://arstechnica.com/tech-policy/2020/06/detroit-police-chief-admits-facial-recognition-is-wrong-96-of-the-time/
Li X, Zhang C, Li W, Ricard R, Meng Q, Zhang W (2015) Assessing street-level urban greenery using Google street view and a modified green view index. Urban for Urban Green 14(3):675–685. https://doi.org/10.1016/j.ufug.2015.06.006
Article Google Scholar
Lin L, Purnell N (2019) A world with a billion cameras watching you is just around the corner. Wall Street J. https://www.wsj.com/articles/a-billion-surveillance-cameras-forecast-to-be-watching-within-two-years-11575565402
Lynch K (1960) The image of the city. MIT Press
Google Scholar
IHS Markit (2019) Security technologies top trends for 2019. In: IHS markit security technologies. https://technology.informa.com/Research-by-Market/551540/security-technology
Massaro E, Ahn C, Ratti C, Santi P, Stahlmann R, Lamprecht A, Roehder M, Huber M (2017) The car as an ambient sensing platform [point of view]. Proc IEEE 105(1):3–7. https://doi.org/10.1109/JPROC.2016.2634938
Article Google Scholar
Mayor’s Office for New Urban Mechanics (2018) Beta blocks. City of Boston
McDuff D, El Kaliouby R, Demirdjian D, Picard R (2013a) Predicting online media effectiveness based on smile responses gathered over the Internet. 2013a 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013a. https://doi.org/10.1109/FG.2013a.6553750
McDuff D, El Kaliouby R, Senechal T, Amr M, Cohn JF, Picard R (2013b) Affectiva-MIT facial expression dataset (AM-FED): naturalistic and spontaneous facial expressions collected “in-the-wild.” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 881–888. https://doi.org/10.1109/CVPRW.2013b.130
Mozer P (2019) One month, 500,000 face scans: how China is using A.I. to profile a minority. The New York Times. https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html
Naik N, Philipoom J (2014) Streetscore-predicting the perceived safety of one million streetscapes. Proc IEEE. https://doi.org/10.1109/CVPRW.2014.121
Article Google Scholar
Norden E (1969) Marshall McLuhan—a candid conversation with the high priest of popcult and metaphysician of media. Essential McLuhan 2:233–270
Google Scholar
Noueihed L (2011) Peddler’s martyrdom launched Tunisia’s revolution | Reuters. Reuters. https://www.reuters.com/article/tunisia-protests-bouazizi-idAFLDE70G18J20110119
O’Hara S, Lui YM, Draper BA (2011) Unsupervised learning of human expressions, gestures, and actions. Face Gest 2011:1–8. https://doi.org/10.1109/FG.2011.5771473
Article Google Scholar
Offenhuber D, Nabian N, Vanky A, Ratti C (2013) Data dimension: accessing urban data and making it accessible. Proc ICE Urban Des Plann 166(1):60–75. https://doi.org/10.1680/udap.12.00011
Article Google Scholar
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley MHAD: a comprehensive multimodal human action database. IEEE Workshop Appl Comput vis (WACV) 2013:53–60. https://doi.org/10.1109/WACV.2013.6474999
Article Google Scholar
Paglan T (2016) Invisible images (your pictures are looking at you) – The New Inquiry. The New Inquiry. https://thenewinquiry.com/invisible-images-your-pictures-are-looking-at-you/
Pasquinelli M (2015) Anomaly detection : the mathematization of the abnormal in the metadata society. Transmed Festiv 2:1–10
Google Scholar
Patron-Perez A, Marszalek M, Reid I, Zisserman A (2012) Structured learning of human interactions in TV shows. IEEE Trans Pattern Anal Mach Intell 34(12):2441–2453. https://doi.org/10.1109/TPAMI.2012.24
Article Google Scholar
Picard RW (1995) Affective computing. In: Perceptual computing section technical reports (Issue 221)
Pickles J (1997) Tool or science? GIS, technoscience, and the theoretical turn. Ann Assoc Am Geogr 87(2):363–372. https://doi.org/10.1111/0004-5608.00058
Article Google Scholar
Rice S (1997) Parisian views. The MIT Press
Google Scholar
Rossman GB, Rallis SF (2017) In: An introduction to qualitative research: learning in the field, 4th edn. SAGE Publications Inc, Singapore. https://doi.org/10.4135/9781071802694
Book Google Scholar
Salesses P, Schechtner K, Hidalgo C (2013) The collaborative image of the city: mapping the inequality of urban perception. PLoS ONE. https://doi.org/10.1371/journal.pone.0068400
Article Google Scholar
Schwarzer M (2017) Computation and the impact of new technologies on the photography of architecture and urbanism. Architect MPS. https://doi.org/10.14324/111.444.amps.2017v11i4.001
Article Google Scholar
Seer S, Brändle N, Ratti C (2014) Kinects and human kinetics: a new approach for studying pedestrian behavior. Transport Res Part C Emerg Technol 48:212–228. https://doi.org/10.1016/j.trc.2014.08.012
Article Google Scholar
Selinger E, Fox Cahn A (2020) Did you protest recently? Your face might be in a database. The Guardian. https://www.theguardian.com/commentisfree/2020/jul/17/protest-black-lives-matter-database
Shankar S, Halpern Y, Breck E, Atwood J, Wilson J, Sculley D (2017) No classification without representation: assessing geodiversity issues in open data sets for the developing world. ArXiv. http://arxiv.org/abs/1711.08536
Shepardson D (2020) IBM says U.S. should adopt new export controls on facial recognition systems. Reuters. https://www.reuters.com/article/us-ibm-facial-recognition-exports/ibm-says-u-s-should-adopt-new-export-controls-on-facial-recognition-systems-idUSKBN2621PV
Smaira L, Carreira J, Noland E, Clancy E, Wu A, Zisserman A (2020) A short note on the kinetics-700–2020 human action dataset. ArXiv. http://arxiv.org/abs/2010.10864
Soomro K, Shah M (2017) Unsupervised action discovery and localization in videos. IEEE Int Conf Comput vis (ICCV) 2017:696–705. https://doi.org/10.1109/ICCV.2017.82
Article Google Scholar
Spatial Analysis Lab (2019) Ethnicity linguistic landscape data. https://slab.today/2019/09/ethnicity-lld/
Stanley J (2019) The dawn of robot surveillance. In: ACLU (Issue June). https://www.aclu.org/report/dawn-robot-surveillance
Sun P, Hou R, Lynch JP (2020) Measuring the utilization of public open spaces by deep learning: a benchmark study at the detroit riverfront. ArXiv 1:2228–2237
Google Scholar
Talen E, Ellis C (2015) Beyond relativism reclaiming the search for good city form. 36–49
Talen E, Ellis C (2002) Beyond relativism: reclaiming the search for good city form. J Plan Educ Res 22(1):36–49. https://doi.org/10.1177/0739456X0202200104
Article Google Scholar
Venturi R, Brown DS, Izenour S (1972) Learning from Las Vegas. The MIT Press
Google Scholar
Whyte W (1980) The social life of small urban spaces. The Conservation Foundation. http://trid.trb.org/view.aspx?id=521122
Winner L (2017) Do artifacts have politics? Routledge
Book Google Scholar
World Economic Forum (2020) The future of the last-mile ecosystem. In: Transition roadmaps for public- and private-sector players (Issue January). https://www.weforum.org/reports/the-future-of-the-last-mile-ecosystem
Yang S, Bailey E, Yang Z, Ostrometzky J, Zussman G, Seskar I, Kostic Z (2020) COSMOS smart intersection: edge compute and communications for bird’s eye object tracking. 2020 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2020. https://doi.org/10.1109/PerComWorkshops48775.2020.9156225
Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 5534–5542. https://doi.org/10.1109/CVPR.2016.597
Yin L, Cheng Q, Wang Z, Shao Z (2015) “Big data” for pedestrian volume: exploring the use of google street view images for pedestrian counts. Appl Geogr 63:337–345. https://doi.org/10.1016/j.apgeog.2015.07.010
Article Google Scholar
Zukin S (2020) Seeing like a city: how tech became urban. Theory Soc 49(5–6):941–964. https://doi.org/10.1007/s11186-020-09410-4
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Architecture, Planning and Preservation, Columbia University, New York, USA
Anthony Vanky & Ri Le

Authors

Anthony Vanky
View author publications
You can also search for this author in PubMed Google Scholar
Ri Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Vanky.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vanky, A., Le, R. Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces. AI & Soc 38, 1193–1207 (2023). https://doi.org/10.1007/s00146-022-01625-6

Download citation

Received: 19 May 2021
Accepted: 16 December 2022
Published: 10 January 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s00146-022-01625-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces

Abstract

Access this article

Similar content being viewed by others

Looking with Machine Eyes: City Monitoring for Urban Resilience

Innovative Approaches to Assessing Urban Space Quality: A Multi-Source Big Data Perspective on Knowledge Dynamics

Characterisation of urban environment and activity across space and time using street images and deep learning in Accra

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces

Abstract

Access this article

Similar content being viewed by others

Looking with Machine Eyes: City Monitoring for Urban Resilience

Innovative Approaches to Assessing Urban Space Quality: A Multi-Source Big Data Perspective on Knowledge Dynamics

Characterisation of urban environment and activity across space and time using street images and deep learning in Accra

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation