Abstract
Increasing computational power and improving deep learning methods have made computer vision technologies pervasively common in urban environments. Their applications in policing, traffic management, and documenting public spaces are increasingly common (Ridgeway 2018, Coifman et al. 1998, Sun et al. 2020). Despite the often-discussed biases in the algorithms' training and unequally borne benefits (Khosla et al. 2012), almost all applications similarly reduce urban experiences to simplistic, reductive, and mechanistic measures. There is a lack of context, depth, and specificity in these practices that enables semantic knowledge or analysis within urban contexts, especially within the context of using and occupying urban space. This paper will critique existing uses of artificial intelligence and computer vision in urban practices to propose a new framework for understanding people, action, and public space. This paper revisits Geertz's (1973) use of thick descriptions in generating interpretive theories of culture and activity and uses this lens to establish a framework to approach evaluating the varied uses of computer vision technologies that weigh meaning. By discussing cases of implemented examples of urban computer vision—from LinkNYC and Numina's urban measurements to the Detroit Police's use of DataWorks Plus's facial recognition technology—it proposes a framework for evaluating the thickness of the algorithm's conclusions against the computational method's complexity required to produce that outcome. Further, we discuss how the framework's positioning may differ (and conflict) between different users of the technology, from engineer to urban planner and policymaker, to citizen. This paper also discusses how the current use and training of deep learning algorithms and how this process limits semantic learning and proposes three potential methodologies toward gaining a more contextually specific, urban-semantic, description of urban space relevant to urbanists. This paper contributes to the critical conversations regarding the proliferation of artificial intelligence by challenging the current applications of these technologies in the urban environment by highlighting their failures within this context while also proposing an evolution of these algorithms that may ultimately make them sensitive and useful within this spatial and cultural milieu.
Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
Notes
“Imageable” here is taken to mean a cognitive, memory-based image as was used in the previous reference of Lynch’s work.
These together form the commonly used “four V’s” of big data: velocity, veracity, volume and variety.
While this paper will not comprehensively review the technology and its application depth, other papers have sought to categorize various approaches. See Ibrahim et al. 2020, for instance.
References
ABI Research (2021) Deep learning-based machine vision in smart cities. https://www.abiresearch.com/press/global-installed-base-smart-city-cameras-ai-chipset-reach-over-350-million-2025/
Ackerman D (2017) Google maps street view celebrates its 10th birthday. CNet. https://www.cnet.com/news/google-maps-street-view-celebrates-its-10th-birthday/
ACLU NY (2016) NYCLU: city’s public wi-fi raises privacy concerns.
Al-Faris M, Chiverton J, Ndzi D, Ahmed AI (2020) A review on computer vision-based methods for human action recognition. J Imag. https://doi.org/10.3390/jimaging6060046
Anguelov D, Dulong C, Filip D, Frueh C, Lafon S, Lyon R, Ogale A, Vincent L, Weaver J (2010) Google street view: capturing the world at street level. Computer 43(6):32–38. https://doi.org/10.1109/MC.2010.170
Attribute detection with Body Camera Analytics (2020) IBM intelligent video analytics documentation. https://www.ibm.com/docs/en/iva/2.0.0?topic=video-attribute-detection-body-camera-analytics
Azar M, Cox G, Impett L (2021) Introduction: ways of machine seeing. AI and society. Springer Science and Business Media Deutschland GmbH, Berlin, pp 1–12. https://doi.org/10.1007/s00146-020-01124-6
Berlyn DE (1971) Aesthetics and psychobiology. Appleton-Century-Crofts
Brannen J (2005) Mixing methods: the entry of qualitative and quantitative approaches into the research process. Int J Soc Res Methodol Theory Pract 8(3):173–184. https://doi.org/10.1080/13645570500154642
Brill M (1989) An ontology for exploring urban public life today. Places 6(1):24–31. http://escholarship.org/uc/item/4kc602c7
byronv2 (2019) Texting one another [photograph]. Flickr. https://flic.kr/p/23B3Jc4
byronv2 (2020a) Ice Cream Time [Photograph]. Flickr. https://flic.kr/p/2jjDBQv
byronv2 (2020b) Lunch al Fresco [Photograph]. Flickr. https://flic.kr/p/2iEczU1
Chetan V (2019) Man jumping from a rock [photograph]. Pexels. https://www.pexels.com/photo/man-jumping-from-a-rock-2923157/
Chidster M (1989) Public places, private lives: plazas and the broader public. Places 6(1):32–37. http://escholarship.org/uc/item/9gr5n6hd
Collins RL (2011) Content analysis of gender roles in media: where are we now and where should we go? Sex Roles 64(3):290–298. https://doi.org/10.1007/s11199-010-9929-5
Collins J (2020) Police bodycam video shows george Floyd’s distress during fatal arrest. NPR. https://www.npr.org/2020/07/15/891516654/police-bodycam-video-provides-fuller-picture-of-george-floyds-fatal-arrest
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 3213–3223. https://doi.org/10.1109/CVPR.2016.350
Crawford K (2018) Artificial intelligence’s white guy problem. The New York Times. https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html
Czarniawska B (1992) Exploring complex organizations: a cultural perspective: toward an anthropological perspective. SAGE, Singapore
Dahlberg L (2015) Charles Marville, photographer of Paris/piercing time: Paris after Marville and atget, 1865–2012. Hist Photogr 39(2):194–196. https://doi.org/10.1080/03087298.2015.1035533
Deng J, Dong W, Socher R, Li L-J, Kai L, Li F-F (2009) ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 20(11): 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Desmond R, Danilewicz A (2010) Women are on, but not in, the news: gender roles in local television news. Sex Roles 62(11):822–829. https://doi.org/10.1007/s11199-009-9686-5
Dreyfus, H. L. (1992). What Computers Still Can’t Do: A Critique of Artificial Reason. The MIT Press.
Duarte F, DeSouza P (2020) Data science and cities: a critical approach. Harvard Data Sci Rev. https://doi.org/10.1162/99608f92.b3fc5cc8
Eagle N, Pentland AS (2009) Eigenbehaviors : identifying structure in routine. 1057–1066. https://doi.org/10.1007/s00265-009-0739-0
Garvie C, Moy LM (2019) America under watch. https://www.americaunderwatch.com/
Geertz C (1973) Thick description: toward an interpretive theory of culture. In: Turning points in qualitative research: Tying knots in a handkerchief, pp 143–168
Gehl J (1987) Life between buildings: using public space. Island Press
Gershenson C (2013) The implications of interactions for science and philosophy. Found Sci 18(4):781–790. https://doi.org/10.1007/s10699-012-9305-8
Gill KS (2020) Prediction paradigm: the human price of instrumentalism. AI Soc 35(3):509–517. https://doi.org/10.1007/s00146-020-01035-6
Girardin F, Calabrese F, Fiore FD, Ratti C, Blat J (2008) Digital footprinting: uncovering tourists with user-generated content. IEEE Pervasive Comput 7(4):36–43. https://doi.org/10.1109/MPRV.2008.71
Goldsmith S, Crawford S (2014) The city as digital platform. In: The responsive city. Jossey-Bass
Greenfield A (2013) Against the smart city. Do Projects
Hand DJ (2020) Dark data: why what you don’t know matters. Princeton University Press
Harwell D (2019) Ring, the doorbell-camera firm, has partnered with 400 police forces, extending surveillance reach. The Washington Post. https://www.washingtonpost.com/technology/2019/08/28/doorbell-camera-firm-ring-has-partnered-with-police-forces-extending-surveillance-reach/
Hernandez J, Hoque M, Drevo W, Picard RW (2012) Mood meter: counting smiles in the wild. Proceedings of the 2012 ACM Conference on Ubiquitous Computing - UbiComp ’12, 301. https://doi.org/10.1145/2370216.2370264
Hill K (2020) Wrongfully accused by an algorithm. The New York Times. https://www.nytimes.com/2020/06/24/technology/facial-recognition-arrest.html
Hinchcliffe T (2010) Aerial photography and the Postwar urban planner in London. Lond J 35(3):277–288. https://doi.org/10.1179/174963210X12814015170232
hjl (2012) Blind date—green park [photograph]. Flickr. https://flic.kr/p/cBGctS
Hollands RG (2008) Will the real smart city please stand up? City 12(3):303–320. https://doi.org/10.1080/13604810802479126
Ibrahim MR, Haworth J, Cheng T (2020) Understanding cities with machine eyes: a review of deep computer vision in urban analytics. Cities 96:102481. https://doi.org/10.1016/j.cities.2019.102481
Idrees H, Zamir AR, Jiang Y-G, Gorban A, Laptev I, Sukthankar R, Shah M (2017) The THUMOS challenge on action recognition for videos “in the wild.” Comput vis Image Underst 155:1–23. https://doi.org/10.1016/j.cviu.2016.10.018
Jacobs J (1970) The economy of cities. Random House
Jacobs A, Appleyard D (1987) Toward an urban design manifesto. J Am Plann Assoc 53(1):112–120. https://doi.org/10.1080/01944368708976642
Jacobs J (1961) The death and life of great American cities. Vintage Books. https://books.google.com/books?hl=en&lr=&id=P_bPTgOoBYkC&oi=fnd&pg=PA7&ots=JW1O38Fpf5&sig=X-9dkYK56vjYblU9O1I-kh0yYFQ#v=onepage&q&f=false
Jemielniak D (2020) Thick big data. Oxford University Press, Oxford. https://doi.org/10.1093/oso/9780198839705.001.0001
Jiang S, Fiore GA, Yang Y, Ferreira J, Frazzoli E, González MC (2013) A review of urban computing for mobile phone traces: current methods, challenges and opportunities. UrbComp
Kirchner L, Mattu S, Larson J, Angwin J (2016) Machine bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Kofman A (2018) Are New York’s free LInkNYC internet kiosks tracking your movements? The Intercept. https://theintercept.com/2018/09/08/linknyc-free-wifi-kiosks/
Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, Pont-Tuset J, Veit A, Bel K (2017) OpenImages: A public dataset for large-scale multi-label and multi-class image classification. https://storage.googleapis.com/openimages/web/index.html
Kubo M, Pasnik M, Grimley C (2010) Tough love: in defense of brutalism. Architect Magazine. https://www.architectmagazine.com/design/tough-love-in-defense-of-brutalism_o
Kwet M (2020) The rise of the video surveillance industrial complex. The Intercept. https://theintercept.com/2020/01/27/surveillance-cctv-smart-camera-networks/
Le Corbusier (1935) Aircraft. The Studio.
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Lee TB (2020) Detroit police chief cops to 96-percent facial recognition error rate. Ars Technica. https://arstechnica.com/tech-policy/2020/06/detroit-police-chief-admits-facial-recognition-is-wrong-96-of-the-time/
Li X, Zhang C, Li W, Ricard R, Meng Q, Zhang W (2015) Assessing street-level urban greenery using Google street view and a modified green view index. Urban for Urban Green 14(3):675–685. https://doi.org/10.1016/j.ufug.2015.06.006
Lin L, Purnell N (2019) A world with a billion cameras watching you is just around the corner. Wall Street J. https://www.wsj.com/articles/a-billion-surveillance-cameras-forecast-to-be-watching-within-two-years-11575565402
Lynch K (1960) The image of the city. MIT Press
IHS Markit (2019) Security technologies top trends for 2019. In: IHS markit security technologies. https://technology.informa.com/Research-by-Market/551540/security-technology
Massaro E, Ahn C, Ratti C, Santi P, Stahlmann R, Lamprecht A, Roehder M, Huber M (2017) The car as an ambient sensing platform [point of view]. Proc IEEE 105(1):3–7. https://doi.org/10.1109/JPROC.2016.2634938
Mayor’s Office for New Urban Mechanics (2018) Beta blocks. City of Boston
McDuff D, El Kaliouby R, Demirdjian D, Picard R (2013a) Predicting online media effectiveness based on smile responses gathered over the Internet. 2013a 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013a. https://doi.org/10.1109/FG.2013a.6553750
McDuff D, El Kaliouby R, Senechal T, Amr M, Cohn JF, Picard R (2013b) Affectiva-MIT facial expression dataset (AM-FED): naturalistic and spontaneous facial expressions collected “in-the-wild.” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 881–888. https://doi.org/10.1109/CVPRW.2013b.130
Mozer P (2019) One month, 500,000 face scans: how China is using A.I. to profile a minority. The New York Times. https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html
Naik N, Philipoom J (2014) Streetscore-predicting the perceived safety of one million streetscapes. Proc IEEE. https://doi.org/10.1109/CVPRW.2014.121
Norden E (1969) Marshall McLuhan—a candid conversation with the high priest of popcult and metaphysician of media. Essential McLuhan 2:233–270
Noueihed L (2011) Peddler’s martyrdom launched Tunisia’s revolution | Reuters. Reuters. https://www.reuters.com/article/tunisia-protests-bouazizi-idAFLDE70G18J20110119
O’Hara S, Lui YM, Draper BA (2011) Unsupervised learning of human expressions, gestures, and actions. Face Gest 2011:1–8. https://doi.org/10.1109/FG.2011.5771473
Offenhuber D, Nabian N, Vanky A, Ratti C (2013) Data dimension: accessing urban data and making it accessible. Proc ICE Urban Des Plann 166(1):60–75. https://doi.org/10.1680/udap.12.00011
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2013) Berkeley MHAD: a comprehensive multimodal human action database. IEEE Workshop Appl Comput vis (WACV) 2013:53–60. https://doi.org/10.1109/WACV.2013.6474999
Paglan T (2016) Invisible images (your pictures are looking at you) – The New Inquiry. The New Inquiry. https://thenewinquiry.com/invisible-images-your-pictures-are-looking-at-you/
Pasquinelli M (2015) Anomaly detection : the mathematization of the abnormal in the metadata society. Transmed Festiv 2:1–10
Patron-Perez A, Marszalek M, Reid I, Zisserman A (2012) Structured learning of human interactions in TV shows. IEEE Trans Pattern Anal Mach Intell 34(12):2441–2453. https://doi.org/10.1109/TPAMI.2012.24
Picard RW (1995) Affective computing. In: Perceptual computing section technical reports (Issue 221)
Pickles J (1997) Tool or science? GIS, technoscience, and the theoretical turn. Ann Assoc Am Geogr 87(2):363–372. https://doi.org/10.1111/0004-5608.00058
Rice S (1997) Parisian views. The MIT Press
Rossman GB, Rallis SF (2017) In: An introduction to qualitative research: learning in the field, 4th edn. SAGE Publications Inc, Singapore. https://doi.org/10.4135/9781071802694
Salesses P, Schechtner K, Hidalgo C (2013) The collaborative image of the city: mapping the inequality of urban perception. PLoS ONE. https://doi.org/10.1371/journal.pone.0068400
Schwarzer M (2017) Computation and the impact of new technologies on the photography of architecture and urbanism. Architect MPS. https://doi.org/10.14324/111.444.amps.2017v11i4.001
Seer S, Brändle N, Ratti C (2014) Kinects and human kinetics: a new approach for studying pedestrian behavior. Transport Res Part C Emerg Technol 48:212–228. https://doi.org/10.1016/j.trc.2014.08.012
Selinger E, Fox Cahn A (2020) Did you protest recently? Your face might be in a database. The Guardian. https://www.theguardian.com/commentisfree/2020/jul/17/protest-black-lives-matter-database
Shankar S, Halpern Y, Breck E, Atwood J, Wilson J, Sculley D (2017) No classification without representation: assessing geodiversity issues in open data sets for the developing world. ArXiv. http://arxiv.org/abs/1711.08536
Shepardson D (2020) IBM says U.S. should adopt new export controls on facial recognition systems. Reuters. https://www.reuters.com/article/us-ibm-facial-recognition-exports/ibm-says-u-s-should-adopt-new-export-controls-on-facial-recognition-systems-idUSKBN2621PV
Smaira L, Carreira J, Noland E, Clancy E, Wu A, Zisserman A (2020) A short note on the kinetics-700–2020 human action dataset. ArXiv. http://arxiv.org/abs/2010.10864
Soomro K, Shah M (2017) Unsupervised action discovery and localization in videos. IEEE Int Conf Comput vis (ICCV) 2017:696–705. https://doi.org/10.1109/ICCV.2017.82
Spatial Analysis Lab (2019) Ethnicity linguistic landscape data. https://slab.today/2019/09/ethnicity-lld/
Stanley J (2019) The dawn of robot surveillance. In: ACLU (Issue June). https://www.aclu.org/report/dawn-robot-surveillance
Sun P, Hou R, Lynch JP (2020) Measuring the utilization of public open spaces by deep learning: a benchmark study at the detroit riverfront. ArXiv 1:2228–2237
Talen E, Ellis C (2015) Beyond relativism reclaiming the search for good city form. 36–49
Talen E, Ellis C (2002) Beyond relativism: reclaiming the search for good city form. J Plan Educ Res 22(1):36–49. https://doi.org/10.1177/0739456X0202200104
Venturi R, Brown DS, Izenour S (1972) Learning from Las Vegas. The MIT Press
Whyte W (1980) The social life of small urban spaces. The Conservation Foundation. http://trid.trb.org/view.aspx?id=521122
Winner L (2017) Do artifacts have politics? Routledge
World Economic Forum (2020) The future of the last-mile ecosystem. In: Transition roadmaps for public- and private-sector players (Issue January). https://www.weforum.org/reports/the-future-of-the-last-mile-ecosystem
Yang S, Bailey E, Yang Z, Ostrometzky J, Zussman G, Seskar I, Kostic Z (2020) COSMOS smart intersection: edge compute and communications for bird’s eye object tracking. 2020 IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2020. https://doi.org/10.1109/PerComWorkshops48775.2020.9156225
Yatskar M, Zettlemoyer L, Farhadi A (2016) Situation recognition: visual semantic role labeling for image understanding. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 5534–5542. https://doi.org/10.1109/CVPR.2016.597
Yin L, Cheng Q, Wang Z, Shao Z (2015) “Big data” for pedestrian volume: exploring the use of google street view images for pedestrian counts. Appl Geogr 63:337–345. https://doi.org/10.1016/j.apgeog.2015.07.010
Zukin S (2020) Seeing like a city: how tech became urban. Theory Soc 49(5–6):941–964. https://doi.org/10.1007/s11186-020-09410-4
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vanky, A., Le, R. Urban-semantic computer vision: a framework for contextual understanding of people in urban spaces. AI & Soc 38, 1193–1207 (2023). https://doi.org/10.1007/s00146-022-01625-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00146-022-01625-6