1 Introduction

To promote a sustainable and thriving society (UNDP, 2015), urban space quality continues to be a classic topic in both traditional research and emerging urban analytic research (Gehl, 1987; Jacobs, 1961; Ma et al., 2024; Mouratidis, 2021). Urban space quality refers to the extent to which the built environment can fulfil citizens’ demands for services and impact people’s mental health (Herzog, 1989; Ma et al., 2024) through spatial features such as the building density (Trivic, 2023), sense of safety (Y. Kang et al., 2023) and urban greenery (Biljecki et al., 2023; Marchi et al., 2022). Enriching the understanding of urban space quality from users’ perspectives can systematically inform the placemaking process for urban development at various spatial levels (Abdul-Rahman et al., 2021; Belkahla Driss et al., 2019), revealing urban space qualities of a district formed by spatial features of affiliated streets and users’ diverse demands across the two levels. This study selects a historic district in Singapore, namely Kampong Glam, and one of its streets best known for tourism called Haji Lane for a case study on their urban space qualities.

However, limited studies have evaluated the differences in urban space qualities between subjective preferences from user perceptions and objective information about the built environment. To bridge this gap, this study focuses on user-perceived qualities, namely, Uniqueness, Vitality, and Liveability, and proposes a novel quantitative approach that innovatively uses Google Street View (GSV) imagery as a baseline for its consistent sight field of view and veracity (Biljecki & Ito, 2021; Guan et al., 2022) to infer user perceptions and preferences of urban spaces from Flickr, Twitter (currently called X), and TripAdvisor, with computer vision (CV) techniques from Google Cloud Vision AI. By fusing multi-source social media data and street view imagery, this structured comparative approach can capture diverse urban perceptions and sentiments of different user groups, thus contributing to the urban space quality evaluation across district and street levels.

This study prioritizes three key urban space qualities derived from classic urban theories (Gehl, 1987; Jacobs, 1961; Lynch, 1960) due to their strong connection to user-generated data and individual-level perceptions at district and street levels, namely, Uniqueness, Vitality, and Liveability. Focusing on these user-perceived qualities allows researchers to combine objective measures and subjective user perceptions for a comprehensive evaluation of urban space quality. Uniqueness, in terms of spatial identity, is the key driver that forms the image of the city (Lynch, 1960) and shapes visitors’ sense of urban spaces (M. Li et al., 2021; Shekhar et al., 2019). Vitality is a feeling of urban space quality evoked by activities that make spaces become places (Gehl, 1987; Huang et al., 2020; Jacobs, 1961; Montgomery, 1998). Liveability is increasingly discussed in trending research from a human-centric perspective, motivated by the availability of user-generated social media content containing visitors’ experiences and sentiments (Badland et al., 2014; Norouzian-Maleki et al., 2018).

This study applies the proposed approach in a case study evaluating the urban space quality at Kampong Glam and Haji Lane. Kampong Glam (also spelt ‘Gelam’) district was traditionally a residential area for Malay royalty, but now it supports both local communities and tourists (Kumar, 2020). In the southern area of Kampong Glam lies Haji Lane, which was associated with the pilgrimage undertaken by Muslims to Mecca and Medina but is now famous for its abundant shops and graffiti (URA.SG, 2024). With the integration of findings across spatial scales, district-level analysis can reveal broader infrastructural and environmental trends while providing nuanced insights into pedestrian experiences at the street level.

Computer vision (CV) methods are widely used in quantitative urban sensing research (P. Liu et al., 2023; Qiu et al., 2022; Szeliski, 2022). The rapid advancements in Artificial Intelligence (AI) and open digital tools for data-driven methodologies can significantly benefit research in sensing and modeling urban environments (P. Liu et al., 2023; Yap et al., 2022), suggesting the potential of innovative technologies from informatics fields in urban studies (Shi et al., 2022). Pre-trained AI models have become a trending tool for urban researchers (Janowicz et al., 2020; Y. Kang et al., 2023; P. Liu & Biljecki, 2022), including Google Cloud Vision AI with technology support in image labeling and object recognition tasks (Vision AI, 2024).

This study implements the proposed approach through three comparisons for a comprehensive understanding of urban space quality from users’ perspectives:

  1. i.

    Using GSV imagery as a baseline and comparing it with Flickr imagery to extract spatial elements perceived by users and draw insights into the Uniqueness and Vitality of Kampong Glam and Haji Lane, respectively.

  2. ii.

    Comparing social media text data generated by different user groups on Twitter and TripAdvisor to specify relevant spatial elements contributing to Liveability based on the sentiments of the user-generated content about Kampong Glam and Haji Lane, respectively.

  3. iii.

    Comparing the results across district and street levels to study the differences in user perceptions and demands regarding three aspects of urban space quality—Uniqueness, Vitality, and Liveability—and to discuss the applicability of the proposed approach in various urban contexts.

The research findings justify the feasibility of the proposed approach of fusing multi-source social media data and street view imagery with pre-trained AI models to analyze diverse user groups’ perceptions, reveal users’ spatial preferences, and evaluate urban space quality at district and street levels. The results identify key factors contributing to the Uniqueness, Vitality, and Liveability of urban spaces, and thus inform the placemaking process and urban planning from a user-centric perspective. This approach can be applied to studies on diverse urban contexts at different levels with available street view imagery and user-generated data. Adjustments might be necessary to account for varying data availability, cultural differences, and diverse user groups unique to each urban context.

2 Theoretical framework

Urban space quality is complex in terms of the physical environment meeting citizens’ functional and mental demands (Abbasi et al., 2016; J. Chen et al., 2021; Smith et al., 1997). Despite extensive discussion in previous studies, the definition of urban space quality varies due to diverse evaluation frameworks (Ma et al., 2024; Mouratidis, 2021) and develops with the emerging data sources (Heng et al., 2020; Y. Wang et al., 2021). This study focuses on three aspects of urban space quality building upon classic urban theories, namely, Uniqueness, Vitality, and Liveability.

Classic theories into urban space quality evaluation focus on elements of the built environment within pedestrians’ field of vision. Edge, one of the spatial elements according to Lynch's theory (1960), is the most visually engaged component of street perceptions (Simpson et al., 2019). As the pedestrians’ field of vision is mainly horizontal, the ground floor of buildings plays a vital role in their view (Gehl, 1987), especially the continuity of the street edge and the focal endings far in the distance on streets (Jacobs, 1961). The following research quantifies objective elements of street view to study urban space quality, such as the sense of safety reflected in the broken windows (Fagan & Davies, 2000) and amenities (Zhang et al., 2021).

Existing research has revealed how activities contribute to urban space quality. Jane Jacobs defined urban vitality as the active street life expressed by the presence of pedestrians (1961). She proposed that diversity, concentration, contact opportunity and aged buildings are four generators of urban vibrancy, with two additional requirements: accessibility and border vacuums (1961). Thriving activities is a key driver of socioeconomic interactions (Montgomery, 1998). Trending research (C. Kang et al., 2021; Sulis et al., 2018) analyzes activities for quantitative measurements of urban vitality, including population concentration, activity diversity, time diversity, and space diversity.

With user-generated data, emerging sociological and urban studies have investigated citizens’ experiences and satisfaction (Abbasi et al., 2016; X. P. Song et al., 2020a, 2020b) to form a user-perspective understanding of urban space quality. Owing to the advantages of the immediacy, availability and quantity (Liao et al., 2022), social media data is increasingly applied to urban space quality studies to analyze social activities and individual behaviours from a users’ perspective (M. Li et al., 2021). For example, social media data have been proven reliable for the investigation of citizens’ experiences (Y. Song et al., 2020a, 2020b), communications(Guo et al., 2022), and emotions (Zhu et al., 2021). Thus, citizens’ perceptions can be extracted from user-generated data on social media to study users’ preferences (Lee & Kang, 2021) and opinions on urban spaces (Hausmann et al., 2020; Kruse et al., 2021).

To propose a holistic user-centric evaluation framework of urban space qualities, this study selects Uniqueness, Vitality, and Liveability for incorporation of concepts from Kevin Lynch’s (1960) and Jane Jacobs' theories (1961) focusing on the spatial features and activities, and contemporary research on users’ experiences. Revealing how spaces are perceived, used, and preferred and specifying factors of urban space qualities can inform urban planning and placemaking strategies to meet users’ demands.

The ‘Uniqueness’ or the ‘imageability’ of urban spaces is an orientation clue leading pedestrians to form the image of the city (Lynch, 1960) and perceive districts and affiliate streets. Lynch (1960) emphasizes the importance of individual perception of streets and urban spaces at the district level. Extending research on the city image further points out that the intricacy and vibrancy of activities are motivated by the street environment, including identifiable building typology and unique spatial identity (Appleyard et al., 1971; Qiu et al., 2021).

Vitality is a feeling of urban space quality formed by street activities (Huang et al., 2020). Different from Lynch’s concentration on unique spatial identity, Jacobs's (1961) and Montgomery's theories (1998) in urban vitality built the classic argument of street quality studies associated with human activities (Huang et al., 2020). Jacobs (1961) defined Vitality as the active street life expressed by pedestrian activities in the built environment. Citizens perceive and use urban spaces for necessary activities, spontaneous activities, and social activities (Gehl, 1987), which activates communal spaces in cities and provokes Vitality. As optional and social activities depend considerably on the quality of urban spaces, the diversity of optional activities partly reflects urban Vitality (Montgomery, 1998).

Liveability is reflected in neighborhood satisfaction towards their individual experiences in urban spaces. Different from the observation perspective of Jacobs, Lynch, and Gehl’s theories in urban spaces, the evaluation methods of recent research start to focus on citizens’ perceptions and satisfaction to study the Liveability of urban spaces (Badland et al., 2014; J. Chen et al., 2021; He et al., 2024; Norouzian-Maleki et al., 2018). Although the definitions of Liveability vary from the interaction between environmental and personal characteristics (Pacione, 1990) to citizens’ desires for the contentment of life (Chazal, 2010), the Australian Major Cities Unit definition proposes that liveable cities have attractive built and natural environments which can be assessed by citizens’ satisfaction (Badland et al., 2014; J. Chen et al., 2021; Mouratidis et al., 2019).

In terms of data availability, Uniqueness and Vitality are reflected in the visual characteristics of the built environment captured by GSV imagery and user-generated imagery in social media, while Liveability is best understood through user-generated content expressing individuals’ experiences and sentiments. First, Uniqueness is represented by landmarks (Lynch, 1960), architectural style (Lynch, 1995), and spatial elements of the built environment (Biljecki & Ito, 2021). Google Cloud Vision AI with CV methods (e.g., semantic segmentation and object detection) is increasingly employed to quantify spatial elements beneficial to well-being, such as visual complexity (Guan et al., 2022) and physical activity (R. Wang et al., 2019). Second, the Uniqueness and Vitality perceived by visitors can be extracted from user-generated imagery in social media (J. Chen et al., 2021; Y. Liu et al., 2020a, 2020b; X. P. Song et al., 2020a, 2020b). Third, user-generated text in social media records visitors’ experiences and sentiments, reflecting the Liveability of urban spaces and relevant factors. Researchers apply natural language processing techniques (e.g., topic modeling and sentiment analysis) to identify the main topic discussed by users (Hu et al., 2019; Jiang et al., 2024) and their sentiments (He et al., 2024; Plunz et al., 2019; You & Tunçer, 2016).

However, citizens’ diverse preferences of spatial elements at different urban scales have not been compared to uncover key spatial elements of urban space quality. Scale effects and the incorporation of data sources should be explored in future research to enhance the precision of urban space quality studies at different spatial levels (X. P. Song et al., 2020a, 2020b). Previous research has analyzed citizens’ demands and preferences based on user-generated content, such as the sentiments on transit stations (Chang et al., 2022), travel demand (Liao et al., 2022), and tourists’ preferences (Lee & Kang, 2021). Further exploration of differences among user groups can enrich the understanding of various opinions and benefit equality, diversity and inclusion, especially crucial for policymaking (Quinn et al., 2021) and placemaking strategies (Burton & Mitchell, 2006; Pineo, 2022).

Given the above discussions and inspirations from existing literature, this study fuses multi-source social media data and GSV imagery to identify key spatial elements that contributed to urban space qualities (Uniqueness, Vitality, and Liveability) with imagery data and extract sentiments of different user groups from textual data. With three quantitative comparisons and qualitative discussions referring to classic theories in urban space quality (Gehl, 1987; Jacobs, 1961; Lynch, 1995; Montgomery, 1998), the findings interpret multi-user groups’ perceptions and demands at both district and street scales to instruct user-centric placemaking strategies.

3 Data and methodology

This research conducts a cross-level case study at Kampong Glam and Haji Lane, using GSV imagery as a baseline and social media data for users’ perceptions. The proposed approach (Fig. 1) with three creative comparisons employs a combination of image and text analysis with Google Cloud Vision AI to identify relevant spatial elements and quantitatively evaluate Uniqueness, Vitality, and Liveability at district and street levels.

Fig. 1
figure 1

The workflow of the three-comparison approach

3.1 Case study

Kampong Glam and Haji Lane were selected as the location of a pilot study (Fig. 2) in urban space quality assessment for their emerging development demand, cultural significance, and popularity among both locals and tourists (HAN, 2016; Kumar, 2020; URA.SG, 2024). Kampong Glam, one of Singapore's oldest urban quarters, is a mix of age-old traditions and trendy lifestyles (URA.SG, 2024). Haji Lane is a back alley filled with shops in the southwest part of Kampong Glam (HAN, 2016). This street is now a home for cafes, bars, lifestyle shops, boutiques, and graffiti walls, as well as a hangout for Instagrammers, fashionistas, artists, and troopers looking to experience a different side of Singapore (URA.SG, 2024).

Fig. 2
figure 2

Site location and data collection points

3.2 Data collection

The author collected multi-source datasets by searching keywords, Kampong Glam and Haji Lane, in 2021, encompassing various digital platforms, namely, Google Street View (GSV) imagery, Flicker imagery, Twitter Tweets and TripAdvisor reviews for a broad spectrum of user perspectives and perceptions in these two urban spaces.

From March to May 2022, the author used Google Street View Image Application Programming Interface (API) to obtain street view images (Fig. 2). To form an entire representation of the streetscapes, 67 sampling points for district analysis were selected at the junctions, leading to 268 street view images. For street analysis, 12 sampling points are selected at an interval of 20 meters due to the limitation of Google Street View API, leading to 48 street view images. Coordinates of selected GSV images in Kampong Glam District can be found in supplementary material.

The social media datasets include all Flickr images about Kampong Glam and Haji Lane from 2015 to 2022 and textual data from 2012 to 2022 posted on Twitter and TripAdvisor. Flickr images with district (5052 images for #kampongglam and #kamponggelam) and street hashtags (4458 images for #hajilane) were retrieved by Flickr API in 2022. Twitter Tweets datasets were scraped using district (719 Tweets for #kampongglam and #kamponggelam) and street hashtags (1671 Tweets for #hajilane) in 2022. TripAdvisor reviews (319 reviews for Kampong Glam district and 706 reviews for Haji Lane) were scraped from web pages in 2022. User information and posting times were collected associated with image and text data. All the social media data were anonymized and safely stored in OneDrive for Education purposes.

3.3 Methods and AI tools

The above input data is treated in two categories: imagery and text. GSV images and Flickr images are analyzed by Google Cloud Vision AI to detect objects and get ten labels ranked by the confidence scores. These labels summarize detected objects for each image, such as sky, road, window, tree, people, etc. To represent the image content effectively for quantitative analysis, image labels are classified into six groups that reflect Uniqueness and Vitality. The three groups for Uniqueness are 'place' (indicating specific locations or structures like buildings and streets), 'nature' (related to natural elements like trees and skies), and 'decoration' (referring to decorative elements like art and architecture). For Vitality, the three groups are 'people' (identifying human presence), 'activity' (describing actions or events), and 'transport' (associated with modes of transportation like cars and bicycles). This classification provides a general overview of the image content, revealing spatial features potentially contributing to Uniqueness and Vitality.

To quantify the statistic difference across various data sources and across district and street levels, this study applies Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) to compare the diversity of image labels and the Mann–Whitney U test (also known as the Wilcoxon rank-sum test) with continuity correction to compare the distribution of sentiment scores of posts, using R package ‘stats’. Chi-squared test (Plackett, 1983) is a statistical procedure for determining the difference between observed and expected data, which is suitable for identifying the potential difference between the observed label frequencies and the expected label frequencies given the six groups of image labels matching Uniqueness and Vitality respectively. This study then chooses the Mann–Whitney U test as it focuses on the differences between two groups on a single, ordinal variable with no specific distribution (Mann & Whitney, 1947), aligning with the distribution of sentiment scores of Twitter and TripAdvisor data.

3.4 Research design

This study creatively proposes a three-comparison approach integrating Google Cloud Vision AI and multi-source datasets (Fig. 1). The first comparison between GSV and Flickr imagery identifies spatial elements perceived by users relevant to the Uniqueness and Vitality at both district and street levels. The spatial elements from both sources are detected with the image label function of Google Cloud Vision AI and are compared to identify discrepancies or alignments between the physical environment (as captured by GSV imagery) and user perceptions (as captured by Flickr imagery) and thus highlight elements objectively shape the area's Uniqueness and Vitality.

The second comparison between text-based data from different social media platforms (Twitter and TripAdvisor) can uncover factors relevant to the Liveability of urban spaces from general Tweets and tourists’ reviews at both district and street levels. User's Twitter Tweets provide valuable insights into the Liveability aspect of urban spaces, as they often contain subjective views and experiences of the city dwellers. By comparison, user reviews from TripAdvisor offer a tourist-oriented dimension to the study of Liveability as a complement to the Twitter data. The author conducted sentiment analysis and topic modeling with detailed accounts of tourists' experiences in Kampong Glam to understand the themes and topics prevalent in user positive or negative discussions. Sentiments and topics from different platforms reveal the aspects most valued or criticized by different user groups and enrich the understanding of Liveability.

The third comparison contrasts the results across the district (Kampong Glam) and street (Haji Lane) levels, exploring overall differences in user perceptions and demands regarding Uniqueness, Vitality, and Liveability. The same data collection and analytic methods are conducted to understand how perceptions and demands vary at different urban scales. Unique or common elements relevant to Uniqueness, Vitality, and Liveability for both levels draw in-depth insights into future placemaking processes at Kampong Glam and Haji Lane. This approach with comparative analyses can be reproduced in different urban contexts for accordingly insights into diverse user perceptions and suggestions for placemaking processes.

4 Uniqueness and Vitality of Kampong Glam and Haji Lane

The first comparison between GSV imagery and Flickr imagery reveals spatial features specifically perceived by Flickr users over all the elements of the built environment captured by GSV imagery. Compared with the objective information in GSV imagery, Flickr imagery with user-centric visual perceptions further indicates spatial elements preferred by users that contribute to Uniqueness and Vitality. Image labels are classified by six groups matching two urban space qualities: transport, people, and activity for Vitality, and space, nature, and decoration for Uniqueness.

4.1 Statistic differences between GSV and Flickr imagery

There are statistical differences between labels of imagery from Google Street View (GSV) and Flickr, in Kampong Glam and Haji Lane respectively (Table 1). The analysis employs Pearson's Chi-squared test to assess the independence of labeling between the two imagery datasets. For Kampong Glam, a substantial Chi-squared value of 60.934 and a p-value of 0.000 strongly suggest significant differences in the image labels between GSV and Flickr. This result is consistent across two methodologies: one including a simulated p-value based on 2000 replicates, and the standard Pearson's Chi-squared test, which justifies the role of objective information about the built environment from GSV as a baseline. Conversely, the imagery labels detected at Haji Lane showed a Chi-squared value of 9.257, with p-values of 0.065 and 0.099, respectively, indicating a less pronounced difference between GSV and Flickr images at street level. These statistical outcomes underscore the focal disparities in respective datasets collected objectively or provided by users and also reveal variability in the differences between user perceptions and the built environments when zooming in to the street level.

Table 1 Statistical differences between Google Street View (GSV) and Flickr imagery labels in Kampong Glam and Haji Lane using Pearson's Chi-squared test

4.2 Uniqueness and Vitality of Kampong Glam

The comparison between GSV imagery and Flickr Imagery at Kampong Glam shows remarkable differences in image contents (Fig. 3a, b). Due to the street-level perspective of GSV, the most frequent spatial elements extracted from GSV images are 'building' and 'sky'. In addition to the commonly frequent labels, 'building' and 'plant', Flickr users captured images with a wider variety of subjects and interests with a broader diversity of labels about activity, people and decoration, with 'facade' at 2.73%, 'event' at 2.43%, and 'art' at 2%. The group frequency chart (Fig. 3c) indicates that the 'transport' category dominates GSV images, while the 'activity' and 'people' categories appear more in Flickr images, aligning with the user-generated content's social demand.

Fig. 3
figure 3

Comparison I – Labels of GSV and Flickr imagery at Kampong Glam

The Uniqueness of Kampong Glam is contributed by three features, namely shops, arts, and religion. Uniqueness is reflected on labels about specific objects in local culture. For example, the label 'Font' is more frequent in Flickr images (Fig. 3a) partly because citizens tend to take photos with shop signs and item brands at Kampong Glam. Several frequent activity labels, namely ‘market’ and ‘retail’, emphasize the main function of this district with several retail streets and souvenir shops (Fig. 3b). Besides, ‘visual arts’, ‘paint’, and ‘art’ labels accurately capture the features of building facades with wall painting and graffiti in Kampong Glam. One label about people, ‘hat’ is more frequent in Flickr images because of the religious background of this district.

The Vitality of Kampong Glam is reflected in streetlights, food, and recreation (Fig. 3). Due to the constraints of label classification of Google Cloud Vision AI, only several spatial elements potentially contributing to the Vitality of Kampong Glam are detected. Among labels detected from GSV images, only ‘street light’ is a spatial feature of Vitality (Fig. 3a). With more labels about activities detected from Flickr images, several labels about food indicate the vibrancy of restaurants in the district, including 'recipe’, 'food’, and 'cuisine’ (Fig. 3b). One limitation is that the labels provided by Cloud AI lack details about the food and related restaurants.

4.3 Uniqueness and Vitality of Haji Lane

The label frequency between GSV imagery and Flickr imagery at Haji Lane also presents a distinct distribution (Fig. 4a). Haji Lane’s GSV images remain a concentration in label groups 'space', 'nature', and 'transport', reflecting similar physical infrastructure and environment to the district. The top frequent labels (Fig. 4a, b) detected from GSV images are 'building' and 'sky', the same as those at the district level (Fig. 3a, b). The results of Flickr images show a balance between different label groups (Fig. 4c), with a larger proportion of elements about people and decoration than district-level findings (Fig. 4c).

Fig. 4
figure 4

Comparison I – Labels of GSV and Flickr imagery at Haji Lane

Two spatial features of the street that reflect the Uniqueness of Haji Lane are detected, including abundant greenery and graffiti (Fig. 4c). Higher label frequencies in the 'space' and 'nature' groups of GSV image labels indicate the street’s spatial dynamics, especially the vegetation and potted plants related to impressive greenery views, and street art with façade designs as a key component of its Uniqueness. Some building materials are also labeled, leaving opportunities for in-depth exploration of the building typology and colors.

The Vitality of Haji Lane is formed by diverse social and commercial activities. Street-level Flickr images show a notable presence of labels related to 'activity', 'people', and 'decoration', highlighting the social and cultural activities of the street (Fig. 4a, b). For example, the label 'event' and 'smile' indicates Haji Lane’s role as a popular venue for social gatherings. The 'decoration' category (Fig. 4c) from Flickr images also suggests a thriving environment of shops and recreational activities along the Haji Lane.

5 Liveability of Kampong Glam and Haji Lane

The second comparison between Twitter Tweets and TripAdvisor reviews suggests different user groups' preferences on experiences representing  Liveability. The statistical differences in sentiments of the two data sources reveal the variety in demands among user groups. By dividing user-generated content into groups with positive/neutral and negative sentiments, four main topics related to Liveability are identified: religion, history, space and activity.

5.1 Statistic differences between Twitter and TripAdvisor data

There are significant statistical differences between the sentiment scores of texts from TripAdvisor and Twitter, talking about Kampong Glam and Haji Lane respectively (Table 2), revealed by the Wilcoxon rank sum test with a continuity correction.

Table 2 Statistical differences between TripAdvisor reviews and Twitter Tweets about Kampong Glam and Haji Lane using Wilcoxon rank sum test with continuity correction

For Kampong Glam, a high statistic value of 22,128 and a p-value of 0.000 indicate a significant difference between the sentiments of TripAdvisor and Twitter content. For Haji Lane, an even more elevated statistic of 206,441.500 with a corresponding p-value of 0.000 further indicate that users from TripAdvisor and Twitter shared distinct sentiments as they have different experiences and demands, responding to a more diverse user group on Twitter and a remarkable focus on tourism in TripAdvisor. The results underscore the robust statistical differences between sentiments of content generated by general visitors from Twitter and tourists from Tripadvisor, suggesting the variability among the perceptions of users from different social media platforms about the Liveability of the district and the street.

5.2 Liveability of Kampong Glam

Comparative findings from sentiment scores and topic modeling between TripAdvisor and Twitter data identify factors related to the Liveability at Kampong Glam, such as cultural heritage, and reveal the differences in the feedback of two user groups, such as tourists’ focus on commercial activities. The majority of Tweets about Kampong Glam are neutral to positive as the sentiments of Twitter Tweets show a bell-curve-like distribution with a skew towards positive sentiment (Fig. 5a). TripAdvisor reviews display a similar trend but with a more remarkable concentration on neutral to positive sentiments, suggesting a generally favorable scope of visitors' experiences in the district (Fig. 5a).

Fig. 5
figure 5

Comparison II – Twitter Tweets and TripAdvisor Reviews at Kampong Glam

The frequent keywords of four topics, namely, history, religion, space and activity detected by LDA-based Topic Modeling methods are related to the Liveability of Kampong Glam (Fig. 5c, d). Topics in history and religion are emphasized in Tweets with neutral to positive sentiments (Fig. 5b). The district’s rich cultural heritage not only forms a sense of identity in local communities but also provokes Liveability. The representative keywords are 'heritage', 'Malay’, ’arab’, and several Malay words (Fig. 5c). Specifically, the keywords 'mosque' and 'Sultan' refer to the iconic Sultan Mosque, a significant religious and cultural landmark of Kampong Glam. By comparison, the topics, space and activity are more important from tourists’ perspective (Fig. 5c). Dominated keywords, including 'book centre', 'shops', 'streets', and 'mosque', reflect commercial functions and attractive spaces of Kampong Glam, highlighting the relationship between its role as a historical and multicultural hub.

The proportion of negative posts on Twitter is more than that on TripAdvisor with keywords about spatial features and personal experiences (Fig. 5d). Diverse keywords about space and activities from negative TripAdvisor reviews indicate potential issues with accessibility or unmet expectations of Kampong Glam. For example, 'night' under the topic ‘religion’ is detected from user’s negative experiences around religious spaces during evening times. Within the topic ‘space’, users' complaints about 'narrow streets' and 'disappointed' indicate a demand for pedestrian-friendly street spaces. Negative feedback on activities, such as 'shops,' 'restaurants,' and 'tourist', suggests the focus of tourists' complaints on recreational services.

While Kampong Glam is generally viewed positively, suggesting good Liveability based on its cultural, commercial, and aesthetic attributes, targeted urban planning according to tourism's demand could further enhance its Liveability. Elements that contribute to Liveability in Kampong Glam can be inferred from the positive sentiments and associated keywords. Historical and religious factors considerably enhance the Liveability for its indispensable role in forming the local identity. A vibrant commercial scene is also crucial, especially from tourists’ perspective. Descriptors like 'beautiful' and 'nice' point to the aesthetic and recreational views of the district, recalling the visual arts and street facade contributed to Vitality. Negative feedback proposes an improvement in the spatial and recreational elements to enhance Liveability. Complaints about 'narrow streets' suggest a pedestrian-oriented strategy benefiting both residents and tourists. Demands on recreational experiences require additional investigations and should be considered in future policymaking processes.

5.3 Liveability of Haji Lane

Similar to the sentiment distribution of district-level data, the majority of tweets about Haji Lane are neutral to positive with a bell-curve-like distribution with a skew towards positive sentiment (Fig. 6a). TripAdvisor reviews remarkably concentrate on neutral to positive sentiments, suggesting a satisfactory scope of visitors' experiences at the street level. Keywords under four topics are discussed by two groups of sentiments and two datasets (Fig. 6b), namely, history, religion, space and activity (Fig. 6c, d). Keywords of history topic highlight Malay culture and Arab Street discussed on Twitter. TripAdvisor reviews discuss more walking experiences and recreational activities in space and activity topics.

Fig. 6
figure 6

Comparison II—Twitter Tweets and TripAdvisor reviews at Haji Lane

Findings of Twitter and TripAdvisor data suggest several spatial elements related to the Liveability at Haji Lane. The Liveability at the street level is mainly contributed by space and activity with fewer factors from history and religion topics (Fig. 6c) compared to that at the district level (Fig. 4c). More granular positive experiences are detected from TripAdvisor reviews, ranging from restaurants, cafes, shops, arts, and bars (Fig. 6c). It suggests street-level can provide more details about tourism demand and draw insights into tourism planning to enhance the Liveability. Negative sentiments (Fig. 6d) are still associated with recreational experiences which might be irrelevant to Liveability.

6 Cross-level differences in urban space quality

The third comparison is between the results of Haji Lane with the same approach and district-level findings to discuss cross-level differences in user perceptions and demands, drawing insights into placemaking strategies to enhance the Uniqueness, Vitality, and Liveability of Kampong Glam and Haji Lane.

6.1 Statistic difference between Kampong Glam and Haji Lane

Table 3 presents a detailed statistical analysis of the differences between Kampong Glam and Haji Lane among data from various platforms. For imagery data, Pearson’s Chi-squared test and its variant with a simulated p-value (based on 2000 replicates) were applied to the frequency of image labels detected from GSV and Flickr images. Significant differences were noted in GSV data, with a Chi-squared value of 14.984 yielding p-values of 0.010 and 0.005, respectively, suggesting notable disparities in image labelling or content between the two spatial levels. In contrast, Flickr data showed no significant differences, with Chi-squared values of 7.402 and higher p-values of 0.192 and 0.175. Since GSV imagery is collected with a constant sight and covers the entire area, the difference between the two spatial levels justifies the reliability of using street view imagery to form an objective understanding of the built environment. For Flickr imagery, users have consistent preferences for spatial elements shared through photos, suggesting the validity of the proposed approach mining user perceptions from this platform to study their spatial preferences and evaluate urban space qualities.

Table 3 Statistical differences in imagery labels and text sentiments between Kampong Glam and Haji Lane

For text data, the Wilcoxon rank sum test with continuity correction analyzed sentiment scores from Twitter tweets and TripAdvisor reviews. Both datasets showed high statistic values (256,310 for Twitter and 109,220 for TripAdvisor) but yielded non-significant p-values of 0.836 and 0.529, respectively, indicating that there are no notable differences in sentiment between the texts regarding Kampong Glam and Haji Lane. These results further indicate the consistency of user-generated content on social media, suggesting an alignment of Liveability across district and street levels.

6.2 Relationship between Kampong Glam and Haji Lane

Although Haji Lane is part of the Kampong Glam district, users’ preferences show different focuses. The Uniqueness and Vitality perceived by citizens based on their photos show a functional shift from history and religion at the district level to recreational activities at the street level. Future placemaking strategies can further emphasize the dominant elements that contributed to Uniqueness and Vitality, including diverse activities and wall painting with graffiti at Haji Lane. Experiences related to Liveability show similar focus at two levels. In Kampong Glam, keywords with high relevance to positive emotions are very diverse, encompassing a variety of aspects related to cultural background, landmarks, and activities, such as Sultan Mosque, unique streets, ethnic food, and so on. In contrast, the street-level results correspond to some of the characteristics of the district, but are more focused and prominent, relating mainly to the wall paintings, shops, and restaurants on the street. This is because the active street edges (Gehl, 1987) with wall paintings and shops enable interactions between citizens and promote online communication about popular places with photos.

Given the differences in use-generated social media text at different spatial levels, policymakers should establish different strategies when collecting specific user groups’ feedback on the Liveability of the district or street. Common users from Twitter and tourists from TripAdvisor have different preferences and sharing demands in terms of Liveability. Twitter users share less information at street level compared with the amount of posts about Haji Lane. TripAdvisor users remain interested in both district and street, with considerable attention to specific places and activities.

7 Discussion

This case study at Kampong Glam and Haji Lane remarkably reveals spatial elements contributing to Uniqueness, Vitality and Liveability according to users’ perceptions. The application of these insights into urban space qualities and user-centric placemaking methods extends beyond the local context. Furthermore, the results indicate the implications of this approach for urban planning and GeoAI research, underscoring the benefits of fusing street view imagery, multi-source social media data and AI tools for urban sensing. Specifically, this approach exemplifies how integrating diverse data sources can transform urban planning strategies globally, addressing challenges such as data bias caused by user groups and incomplete data coverage commonly faced in the field.

7.1 Implications of the proposed approach in broader fields

The proposed structural approach of the study can be practically applied in the preparatory phase of placemaking and urban revitalization in diverse urban contexts to form a comprehensive investigation of the built environment, users’ perceptions, and users’ demands on urban spaces. Identified spatial features of Uniqueness emphasize the anchor of the local culture and integrate heritage conservation with urban planning. Perceived spatial elements of Vitality specify dominant social and commercial activities in districts and streets. Variety in experiences about Liveability across different user groups indicates the necessity of inclusive placemaking strategies. The detailed investigation of the above urban space qualities can guide urban planners in the design and implementation of public spaces that enhance community engagement and satisfaction, with insights into targeted interventions in areas facing diverse demands on residence, commerce, and tourism.

The advantages of employing advanced GeoAI techniques with street view imagery for objective information not only complement traditional investigation methods but also lead to a more granular and accurate perception of the built environment. This study justifies the applicability of using GSV as a baseline to quantify spatial elements from the built environment, building upon previous research in street view imagery (Biljecki & Ito, 2021; Y. Kang et al., 2020). This comparative approach can be implemented for urban planning as a virtual site investigation for both objective and subjective evaluation of citizens’ demands. Object detection (J. Liu et al., 2020a, 2020b) and image classification(J. Kang et al., 2018; Kubany et al., 2020) have been widely used for element-level observation in the built environment with street view imagery. With a pre-trained CV model and street view imagery from an omnipresent map service, the researcher can quantify spatial elements in most urban areas with robust data sources and get valid results.

While advanced technologies provide the tools necessary for detailed urban analysis, the real insights are gleaned from the diverse data they process. The analyses of text and imagery through these technologies shed light on nuanced urban dynamics, illustrating how different types of data can reveal varied aspects of urban life perceived and preferred by citizens. This study analyses social media data in two formats, text and imagery, which play different roles in the evaluation of urban space quality from users’ perspectives. In general, textual data contains more detailed and accurate descriptions of activities and spatial preferences. The richness of their expressive meanings requires information extraction with in-depth profiling and summarization (Gu & Shen, 2019). By comparison, social media imagery is usually photos recording user’s experiences (Y. Li & Xie, 2020; Vaziri et al., 2020). Social media users share photos for some reasons which also reflect their sentiments when perceiving the city (Chen et al., 2017; Harvey, 2013). Elements derived from the positive or negative sentiments of tourists’ reviews and social media imagery about users' experiences can be indicative of urban planning to make cities inclusive, safe, resilient and sustainable (UNDP, 2015; United Nations, 2023).

Furthermore, the fusion of multi-source data enriches the understanding of urban spaces perceived by diverse social groups and alleviates the bias of a single data source in urban sensing. Social media data as a kind of user-generated content is shared by users for communication, friendship maintenance, job seeking, or self-presentation, which is generated more spontaneously and unconsciously compared to traditional methods like surveys and interviews (Olteanu et al., 2019). As a result, social media data is believed to reflect more genuine and unfiltered user perspectives, free from the biases that might arise from the structured nature of surveys or the influence of observers. Since user groups and services vary among platforms (Harvey, 2013), a combination of data from multiple social media can provide multi-perspective information about one urban area perceived by different user groups (Heikinheimo et al., 2020; Martí et al., 2019). Social media data from Twitter contains citizen's general ideas while TripAdvisor as a tourist-oriented platform can be a convincing indicator of tourists' demands. As existing research has shed light on the different demands between residents and tourists (Chen et al., 2017; Ellard et al., 1999), such a comparison can assist policymakers in developing targeted strategies for an inclusive society.

7.2 Limitations and future work

While the insights garnered from this study are based on data from specific urban areas, the methodologies employed are designed with adaptability in mind. Urban planners in different geographical and cultural contexts might adjust the data collection method and GeoAI techniques based on local data availability and research demands. This adaptability ensures that the approach remains effective in diverse urban settings, from single public spaces such as parks, to higher spatial levels, such as cities and regions.

Constraints in data availability and pre-trained AI models lead to the limitations of this study. The selection of street view images at the junction (Fig. 2) for collection efficiency cannot cover every spatial element in the built environment. While the data fusion approach provides a comprehensive view, the reliance on social media introduces potential biases reflecting active internet users' perspectives more than populations with difficulty accessing social media platforms (Martí et al., 2019). The integration of multi-source social media data may only be suitable for research in active and popular urban spaces with large user groups for a valid analysis (Viñán-Ludeña et al., 2020). Additionally, the study's reliance on visual data may overlook non-visual aspects of urban spaces, such as smell (Henshaw, 2013) and sound (Rey Gozalo et al., 2018), that are crucial for a holistic understanding of urban space qualities, inspiring future studies to address these gaps. Besides, Geographic coordinate information of social media posts and images was not included due to the privacy restrictions of social platforms, while some social media data with crowdsourcing services can be alternatives for geographical data (P. Liu & Biljecki, 2022; Y. Wang et al., 2021).

Google Cloud Vision AI has general limitations of pre-trained AI models (Bisong, 2019; Han et al., 2021), suggesting the potential for customized models with fine-tuning in future studies. As a pre-trained model, Google Cloud Vision AI used in this study can only extract and describe relatively rough information and object labels from images (Bisong, 2019; Omena et al., 2021), with lower accuracy for smaller objects (Nguyen et al., 2019). As a result, label detection cannot accurately quantify spatial features such as the number of windows in the GSV imagery and other important features, including street wall continuity (Jacobs, 1961), window opening density (Gehl, 1987), and building setback (Gehl, 1987). However, with the fine-tuning process and proper data, improvement in the performance of pre-trained AI models in future research will be achievable (Han et al., 2021).

Based on the above discussions on data and technique constraints, future research should focus on expanding the data sources to explore citizens’ multi-facet sensory and non-visual perceptions of urban environments, such as noise (Rey Gozalo et al., 2018), smellscapes (Henshaw, 2013), and air quality (Tao et al., 2019). Additionally, longitudinal studies could assess the long-term impacts of urban planning interventions inspired by GeoAI and multi-source data insights. Furthermore, the trending multi-modal deep learning method (Qian et al., 2016; Suel et al., 2021) promotes the potential to extract users’ opinions from text and imagery data with internal relationships. The correlations between textual and visual modalities and the separation of the visual-representative topics and non-visual-representative topics can level up the understanding of users' perceptions and contribute to the integration of multiple sensory data in urban sensing research. These directions would not only refine the proposed approach for user-centric urban planning but also broaden the scope of the application of data and GeoAI techniques in the field of urban informatics.

8 Conclusion

The impactful insights and the potential for future developments in urban informatics have been revealed through the integration of multi-source big data and advanced GeoAI techniques, setting a context for ongoing enhancements in how we perceive, understand and revitalize urban spaces. This section summarizes the novelty and contribution of this exploratory study and extends the implications of the research findings in urban planning, urban sensing and wider fields.

This exploratory study integrates multi-source social media data and GSV imagery with Google Cloud Vision AI to evaluate urban space quality in Kampong Glam district and one affiliate street, Haji Lane. The results identify specific physical spatial features and activities that visitors are interested in and interpret the urban space quality of the district and street from users’ perspectives: Uniqueness, Vitality, and Liveability. Thus, the research findings provide suggestions for placemaking processes in Kampong Glam and Haji Lane, indicating the applicability of the proposed approach in broader urban contexts.

The variety in information from different data sources at different spatial levels is underscored through the cross-level study at Kampong Glam and Haji Lane. The cross-level comparison raises the awareness of the consistency in spatial features under same cultural background from districts to streets and emphasizes how diverse amenities, services and spaces among streets shape the overall urban perception of the district. Stakeholders such as urban planners, local businesses, and community groups can gain valuable insights from cross-level investigations into structured strategies to enhance urban space quality at different spatial levels while considering its cultural and social features.

The proposed approach employs Google Cloud Vision AI to collaboratively analyze multi-source datasets, is replicable and can address technical barriers for urban researchers. Urban designers and policymakers can apply the approach to understand the user-generated imagery of any districts or streets collected from social media (Y. Li & Xie, 2020), street view (Biljecki & Ito, 2021), etc., in combination with traditional research methods, such as observation and surveys, to improve the urban space quality and promote urban revitalization.

This study uniquely contributes to the field of urban space quality evaluation from a user-centric perspective, exploring a new approach for data-driven urban space quality analysis with multi-source data and GeoAI tools. The findings highlight the diverse perceptions of different user groups, especially the active spaces and activities that are of interest to people, and enriches the understanding of urban space qualities, including Uniqueness, Vitality, and Liveability. The evaluation approach of urban space qualities can be used to identify spaces that are currently well-developed and those that are more challenged (M. Li et al., 2021; Y. Li & Derudder, 2022). On this basis, urban planners and policymakers can propose targeted strategies towards the development needs of the site (He et al., 2024; Y. Song et al., 2020a, 2020b). The nuanced insights into user preferences and spatial demands drawn from this study can inform policymakers and contribute to future placemaking processes for more unique, vibrant and liveable urban spaces.