Towards Human-AI Collaborative Urban Science Research Enabled by Pre-trained Large Language Models

Pre-trained large language models (PLMs) have the potential to support urban science research through content creation, information extraction, assisted programming, text classification, and other technical advances. In this research, we explored the opportunities, challenges, and prospects of PLMs in urban science research. Specifically, we discussed potential applications of PLMs to urban institution, urban space, urban information, and citizen behaviors research through seven examples using ChatGPT. We also examined the challenges of PLMs in urban science research from both technical and social perspectives. The prospects of the application of PLMs in urban science research were then proposed. We found that PLMs can effectively aid in understanding complex concepts in urban science, facilitate urban spatial form identification, assist in disaster monitoring, and sense public sentiment. At the same time, however, the applications of PLMs in urban science research face evident threats, such as technical limitations, security, privacy, and social bias. The development of fundamental models based on domain knowledge and human-AI collaboration may help improve PLMs to support urban science research in future.


Introduction
As the most intricate creation of humankind, cities are convoluted systems comprised of multiple dimensions and factors.Consequently, urban research has evolved into a complex and significant social undertaking (Emmi, 2008;Marshall, 2012).Furthermore, the technological revolution, the proliferation of big data in cities, and the dissemination of artificial intelligence have not only transformed cities but have also altered the manner in which urban researchers investigate them (Wang and Yin, 2023) technologies such as Machine Learning (ML), Deep Learning (DL), and their applications in Natural Language Processing (NLP) and Computer Vision (CV) have gained extensive usage in the realm of urban science research (Cai, 2021;Casali, Yonca, Comes and Casali, 2022;Wang and Biljecki, 2022).These emerging technologies pose an opportunity to traditional urban research methodologies and propel urban research towards a quantitative, computational and intelligent direction.However, despite their promising potential, several obstacles hinder their applications, such as low robust performance (Goodfellow, Shlens and Szegedy, 2014), algorithmic and technical constraints (Cai, 2021), and insufficient semantic comprehension (Bender and Koller, 2020).Whether these issues can be resolved via novel technologies or tools constitutes a topic worthy of examination in current urban science research.
PLMs are expected to play a crucial role in advancing urban research through various means, such as simplifying the interpretation of complex urban concepts, automating repetitive tasks programming on analyzing urban data, and improving the utilization of multi-disciplinary knowledge for urban science research (see Figure 1).PLMs will become a potent tool to support urban researchers in their efforts to achieve a new level of depth in urban research.In this paper, we utilize ChatGPT as a tool to investigate the opportunities and challenges associated with PLMs in urban research.The structure of this paper is as follows: In the second section, we outline the possible contributions of PLMs in urban institution, urban space, urban information, and citizen behaviors.We then examine potential issues and challenges facing PLMs in urban research from both technical and social perspectives.Finally, we explore possible directions for PLMs in future urban research.

Urban institution
The domain of urban institutional research comprises a wide range of topics, including but not limited to institutional design, public policy development, public comprehension of policy, and public policy response (Farazmand, 2023).This involves handling a significant amount of textbased data.As AI models, PLMs offer superior intelligent question and answer, text classification, and text generation capabilities.PLMs can comprehend the queries or questions of the researcher and respond with accurate and lucid language in both restricted or open domain Q&A (Wu et al., 2023).PLMs can also extract crucial information from a city system document to provide a summary of the main content of the document (Min, Ross, Sulem, Veyseh, Nguyen, Sainz, Agirre, Heinz and Roth, 2021).Moreover, text classification is a distinct advantage of PLMs.PLMs can tell the positive and negative sentiment of texts, which aid researchers in obtaining prompt public feedback on urban institutions, discerning the public's key requirements for institutions or policies (Karduni and Sauda, 2020), and enables policymakers to comprehend the underlying reasons for public endorsement or opposition to urban institutions (Luo, Tong, Fang and Qu, 2019).This utilization of public opinion helps to advance the construction of urban institutions.We will demonstrate the potential of PLMs in urban institution research through two examples.PLMs provide assistance to urban researchers in tasks such as information retrieval, summarization, and tracking of urban institutions and documents.As the example shown in Figure 2, we utilized ChatGPT to obtain five institutional documents concerning urban land use.The PLMs effectively retrieved and cited the relevant documents through the assistance of the WebGPT plug-in.
In addition, PLMs possess remarkable capabilities for aggregation, allowing for the extraction of pertinent information from extensive materials and the automatic extraction of key points.As an illustration, we provided ChatPDF (an open tool based on the ChatGPT API) with a report by the President's Council of Advisors on Science and Technology (PCAST), Technology and the Future of Cities.We requested that ChatPDF extract and summarize the primary points of the document, as well as respond to specific inquiries regarding particular topics.ChatGPT successfully sorted and condensed the content in the report as requested.It also managed to locate and respond to the specific content in the institutional document while indicating the source of the answer (see Figure 3).
PLMs are capable of acquiring and elucidating complex urban concepts, which can be particularly useful for researchers without a background in urban research.It can explain relevant terminology without requiring additional context.For instance, we tasked ChatGPT with explaining the meanings of various concepts that we had identified as relevant to urban research, such as "Spatial Planning", "Metropolitan Area", "Smart City", and "Carbon Neutral".As expected, ChatGPT was able to provide precise and accurate explanations of these concepts (see Figure 4).

Urban space
The study of urban space covers multiple dimensions such as geographic location, spatial form, spatial structure, land use, architectural form, and urban landscape (Koumetio Tekouabou, Diop, Azmi and Chenal, 2023;Sharifi, Khavarian-Garmsir, Allam and Asadzadeh, 2023).These dimensions involve diverse textual and non-textual data  sources.The advent of PLMs presents novel approaches for integrating multi-source data in urban space study.Due to its advanced natural language processing capabilities, ChatGPT is able to carry out tasks such as code generation and modification (Merow, Serra-Diaz, Enquist and Wilson, 2023;Sobania, Briesch, Hanna and Petke, 2023).This enhances the efficiency of urban spatial research by assisting in programming and streamlining the integration of novel data sources like cell phone signaling, points of interest We used POI data to delineate central city boundaries, as an illustrative example.In this process, PLMs can assist with remote sensing imagery analysis, kernel density analysis, and other methods.For instance, we could query ChatGPT for guidance on "batch processing image cropping using ArcGIS, along with Python code and explanations," and employ ArcPy to execute the command (see Figure 5).PLMs can also assist with POI data crawling.We can make a request to ChatGPT: "How can we crawl POI data with permission through the AMap (http://lbs.amap.com/)API?" ChatGPT can then provide the relevant code for POI data crawling (see Figure 6).PLMs have the ability to assist with programming, which makes them useful in urban streetscape recognition.One application of PLMs is to use models such as convolutional neural networks (CNNs) for urban landscape recognition.For instance, we utilized ChatGPT to help us construct a CNN model for the identification of street trees in urban streetscapes, using Python code (see Figure 7).

Urban information
Urban information refers to information generated by a multitude of data sources such as information and communication technologies (ICT), remote and physical sensors, and individuals (Wang and Yin, 2023), and encompasses a wide range of topics including urban traffic, logistics, environment, disasters, and various types of urban economic information (Ismagilova, Hughes, Dwivedi and Raman, 2019).PLMs possess the capacity for helping monitor and predict natural disasters or public health events.Firstly, as an important function of PLMs, text mining has the ability to identify and extract disaster-related information from diverse sources, such as news articles, social media, and emergency reports.This information includes the time, location, and magnitude of the disaster.Secondly, the natural language reasoning capabilities of PLMs can aid in solving various comprehension and reasoning tasks, including scenario estimation for disaster monitoring and generating corresponding monitoring reports (Zheng, Abdel-Aty, Wang, Wang and Ding, 2023).Additionally, time series analysis of disaster texts aids in achieving disaster prediction.As the example shown in Figure 8, we supplied ChatGPT with a text describing a disaster (extracted from a web report of the 2022 floods in Assam, India), and requested it to identify the time and location of the disaster and provide location details.
PLMs are capable of assisting the forecasting of urban information, including housing prices, by utilizing various data sources such as demographic data, real estate listings, and local economic indicators.Moreover, we can perform data analysis to forecast future house prices in a particular area with the aid of auxiliary programming.As an illustration, we could request ChatGPT to construct a random forest model to predict the future trend of housing prices and provide us with the code for this prediction (see Figure 9).

Citizen behaviors
Research on citizen behaviors in cities covers issues such as public sentiment, population mobility, travel behavior, poverty and crime (Sharifi et al., 2023).PLMs can help to study these issues.With their remarkable language processing capabilities, PLMs can parse social media texts, discern  (Zhang, Ding and Jing, 2022).They can analyze the sentiment of posts, online comments, and various types of news or stories, and categorize them as either positive or negative, pros or cons.In addition, PLMs can analyze sentiment trends over time and detect significant changes in public opinion (Wang et al., 2023a).This capability facilitates a comprehensive analysis of the shift in public sentiment towards an event or a location and encourages the utilization of social media in urban research (Abdul-Rahman et al., 2021).As an illustration, we presented ChatGPT with a set of paragraphs describing the utilization of the OpenAI API and its Tweet classifier to perform sentiment analysis on a comment regarding a certain park.ChatGPT was able to accurately identify the sentiment tendencies present in the comment (see Figure 10).
We summarize the possible applications of PLMs to urban institution, urban space, urban information, and citizen behaviors (see Table 1).

. Technical restrictions
Time restrictions: PLMs require vast amounts of data to be trained for initial models.For instance, ChatGPT's training data only goes up to June 2021 (Zhu et al., 2023).This means that ChatGPT can only understand and infer information from 2021 and earlier, making it challenging to update to the most current data (Teubner, Flath, Weinhardt, van der Aalst and Hinz, 2023).Therefore, when asked to provide ten authoritative papers on urban research, ChatGPT was unable to provide current research papers in real time due to data training time constraints (see Figure 11).
Permission restrictions: The issue of data restrictions in PLMs is further exacerbated by the incompleteness and inaccessibility of big data (Salganik, 2019).Although Chat-GPT is capable of searching networks and providing citation source annotations after using the WebGPT plugin, there are still significant limitations in data collection, such as inaccessible cell phone signaling and travel data.This hinders researchers from using PLMs to obtain authoritative information for urban studies.As an example shown in Figure 12, when trying to study urban demographic characteristics, we attempted to ask ChatGPT about the current demographic characteristics of each province in China.However, ChatGPT indicated that it was unavailable due to training data limitations and access restrictions.This indicates that researchers still need to manually retrieve data from specialized databases instead of relying solely on PLMs for conducting research on recent data.
Modality restrictions: Presently, the multimodality of PLMs is mainly exhibited in the inference and analysis of data and text.For other modes like images, audio, and video, plug-ins and auxiliary programming are often required (Yang et al., 2023).It is also challenging for PLMs to directly recognize remote sensing images in urban research, and it is difficult to conduct application in urban soundscape and urban images.

Authenticity and validity
On one hand, it is worth noting that PLMs may generate false information, including fabricated literature (Haluza and Jungwirth, 2023) and factual errors (hallucinations) (Wu et al., 2023), particularly in low-resource settings.On the other hand, the performance of PLMs is not uniformly consistent and stable, which may result in disparate responses to the same query (Liu, Han, Ma, Zhang, Yang, Tian, He, Li, He, Liu et al., 2023).For instance, when we inquired about "information on the fourth census of China", ChatGPT provided wholly inconsistent data, which could lead to entirely erroneous conclusions in urban studies (see Figure 13).It can be observed that ChatGPT does not currently offer a precise and dependable source of information for urban research, nor does it have the capability to effectively integrate diverse types of knowledge.To ensure the reliability and accuracy of PLMs' output information, particularly regarding issues concerning temporal and numerical dimensions like urban time series change and population change, a more rigorous validation approach is necessary.

Comprehension skills
PLMs, being a form of artificial intelligence, are essentially based on inferences about statistical relationships and currently lack the higher-order thinking skills to understand context and nuance (Liu et al., 2023).In the context of complex urban research, this shortcoming can lead to the production of inaccurate data and misinterpretations (Kooli, 2023), resulting in responses that lack depth and insight or even deviate from the intended topic (Farrokhnia, Banihashem, Noroozi and Wals, 2023).
Furthermore, most PTMs are trained using generalpurpose data sources, such as Wikipedia, which can limit their effectiveness in specific domains (Qiu et al., 2020).For instance, when prompted to provide information on the "evolutionary patterns of Chinese landscape", ChatGPT could only offer superficial observations, struggling to grasp the underlying evolutionary patterns (see Figure 14).Consequently, generic PLMs continue to face limitations in comprehending intricate urban theories or patterns.While there exist PLMs that specialize in geography, such as ERNIE-GeoL, GeoBERT, and SpaBERT, their current use in the field of urban research is restricted by permissions and limited functionality, such as the classification and matching of POI, address segmentation, and geographic entity coding.

Lack of trust
The technical black box is an important feature of AI development (Yigitcanlar and Cugurullo, 2020).The PLMs, such as ChatGPT, are capable of providing feedback to users, yet they are incapable of elucidating the computational process that underlies their decision-making and predictive capabilities (Sanderson, 2023).This limitation leads to a dilemma in the application of PLMs in urban research.
On the one hand, PLMs cannot guarantee the source and reference of generated information.The opacity of PLMs could potentially result in significant consequences when dealing with certain NLP tasks that demand high precision in the context of urban research.On the other hand, the public, who is one of the focal groups of urban research, may not have confidence that their private information is not being utilized for data retrieval and processing, thereby undermining public trust in PLMs.Consequently, PLMs need to augment their transparency and traceability, through algorithmic optimization or legal regulations, to address the expectations of both researchers and the public.

Social bias and discrimination
The training data for PLMs is typically obtained from publicly available web resources.However, there exists a significant amount of biased data on the internet, including information related to race, religion, and gender, among others (Buolamwini and Gebru, 2018).This bias can persist and be reflected in PLMs after training (Farrokhnia et al., 2023;Jungwirth and Haluza, 2023).Such biases in the model can have a harmful impact on the relevant groups of the public, perpetuating stereotypes and derogatory images (Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell et al., 2020).Furthermore, the population that utilizes internet resources has certain group characteristics, resulting in training samples that are biased and fail to accurately reflect the requirements of marginalized groups (Yang et al., 2023).And one of the purposes of urban research is to promote equal and sustainable development of urban citizens (Meerow, Pajouhesh and Miller, 2019).Discrimination and prejudice can have significant social harm, even with minor deviations (Liang, Bommasani, Lee, Tsipras, Soylu, Yasunaga, Zhang, Narayanan, Wu, Kumar et al., 2022), resulting in unreasonable allocation of urban space, unjust public decision-making, and widening urban-rural divide.

Threat to information safety
Security and privacy are key issues to consider in the applications of PLMs to urban research.By virtue of utilizing researchers' queries and input data as their training material (Clarke, 2023), PLMs may potentially give rise to issues of data leakage and data theft.Such circumstances can result in data leakages of individuals and cities, thereby threatening personal privacy and city security.PLMs, such as ChatGPT, is possible to steal personal information from cities or the public through phishing emails and malware (Wu et al., 2023), thus threatening city security and personal privacy.Furthermore, trained data by PLMs may be biased or erroneous, potentially yielding harmful output.PLMs are highly communicative and interactive.If harmful content is disseminated in large quantities, it can trigger a serious "infodemic" phenomenon (De Angelis, Baglivo, Arzilli, Privitera, Ferragina, Tozzi and Rizzo, 2023;Zarocostas, 2020), generating mass anxiety, hate speech, and even urban riots, thereby jeopardizing urban public safety.Consequently, researchers should be circumspect with respect to sensitive information provided to PLMs, while simultaneously considering the security of PLMs' answers and strengthening the safety of urban and personal information.

Future directions
Based on the aforementioned exploration of the opportunities and challenges surrounding PLMs applied to urban research, we put forward several potential avenues that can enhance the role of PLMs in urban science research: First of all, fundamental models based on urban research areas can be developed.As a consequence of the requirement for extensive model multimodal applications in urban research, coupled with the restrictions on using current models, the development of fundamental models within the realm of urban research could emerge as a novel avenue (Wang et al., 2023a).This approach would incorporate multimodal applications, such as text, data, image, audio, and video, to extend the utilization of multi-source big data in urban research.Also, the foundational models customized for urban research could enhance the accuracy and precision of results, facilitating more intricate urban research tasks, such as the exploration of complex urban theories and laws.
Secondly, human-AI collaboration can be applied to facilitate urban research.The text analysis, abstract summarization, and assisted programming capabilities of PLMs have the potential to significantly enhance the research efficiency of urban researchers.PLMs can help strengthen the academic exchange of urban research and enhance the diversity of perspectives (van Dis et al., 2023).Furthermore, the integration of PLMs with emerging techniques such as deep learning can aid researchers in overcoming technical limitations and adapting to new urban research methods in the context of big data.This, in turn, would allow researchers to focus more on urban theoretical research and paradigm innovation.Finally, PLMs are expected to provide technical support for new directions in urban research, such as digital twin cities.
Thirdly, PLMs can be used to improve public participation and urban decision-making.On one hand, PLMs, such as ChatGPT, possess natural language interaction capabilities, which can be utilized to disseminate urban information to the public, thus advancing their comprehension and participation in urban research (Casares, 2018).PLMs are also expected to promote urban research by understanding cities from a more human perspective through deep learning of public opinions.On the other hand, PLMs are poised to provide crucial assistance for urban decision-making, mitigating the undue impact of subjective factors on urban decision-making, and proposing ideas for the optimization of urban decision-making.
Finally, there is a need to be wary of falsehood, privacy, and liability issues.As previously mentioned, issues such as limited data, falsity, and social bias are major concerns that need to be addressed.However, there is no clear consensus on how ChatGPT can regulate these issues related to accuracy, privacy, and liability.As such, it is important to exercise caution and skepticism when using PLMs, to improve our judgment on PLMs answers and to view them as tools rather than relying on them completely (Krügel, Ostermaier and Uhl, 2023).

Conclusion
In this paper, we discuss the opportunities and challenges of PLMs in urban science research, using ChatGPT as an example.PLMs play a crucial role in the study of urban institution, urban space, urban information, and citizen behaviors.The benefits of PLMs in question answering, abstract summarization, and analysis enhance text retrieval efficacy and facilitate the explication of intricate concepts in institutional documents.PLMs can facilitate the applications of new technologies and data in urban research, including through assisted programming.Additionally, the strengths of PLMs in information extraction and text classification enable text-based data to be utilized in urban research, amplify the availability of big data sources for cities, and supply new insights for urban research.
Nevertheless, PLMs still confront numerous challenges in urban research.The issues of temporal limitation, authoritative limitation, modality limitation, credibility, and weak comprehension have been exposed in studies and still pose multiple challenges.Public trust, social biases, and public safety represent significant limitations to the practical applications of PLMs in urban research.These issues require further discussion and consideration.
PLMs will become a potent instrument for urban researchers.We hope to further promote the applications of PLMs in urban research by developing FMs based on the field of urban research, in order to enhance the applications of new urban research and practice in the context of big data.

Figure 1 :
Figure 1: PLMs in urban science research

JiayiFu:
Conceptualization of this study, Methodology, Data curation, Writing -Original draft preparation, Software.Haoying Han: Data curation, Revising the draft.Xing Su: Data curation, Revising the draft.Chao Fan: Conceptualization of this study, Methodology, Writing & Revising -Original draft preparation.

Table 1 :
Summary of PLMs applications in urban research