Introduction

Urban landscapes are highly heterogeneous in space, structurally and functionally (Band et al. 2005; Cadenasso et al. 2013; Zhou et al. 2014). The biophysical and social-economic heterogeneity are closely related to myriad urban problems, such as the urban heat island (Hu et al. 2016; Huang et al. 2010; Oke 1982), urban air pollution (Briggs et al. 2000; Han et al. 2014; Rodríguez et al. 2016), threats to biodiversity (BINO et al. 2008; Nilon et al. 2011), and challenges to human health and well-being (Groenewegen et al. 2006). Mapping and quantifying the heterogeneity of urban structures and functions is crucial for understanding urban patterns, functions, and services, and finally for promoting urban management.

Remote sensing has long been used to quantify and map the spatial heterogeneity of the urban landscape, especially its biophysical structure at coarser scales. For example, medium and coarser-resolution remotely sensed images have been widely used to map and quantify urban expansion and associated land use/land cover changes, at scales from individual cities to regionals, and the globe (Hansen et al. 2000; Xian et al. 2009). In particular, numerous studies have mapped the intensity of impervious surfaces with different data sources and algorithms (Weng 2012; Zhang et al. 2013). With the growing availability of very high spatial resolution satellite imagery (e.g., 1-m IKONOS, 0.6-m QuickBird) and aerial photos/digital imagery, remote sensing has been increasingly used to map specific landscape features such as buildings, trees, and small-sized lawns, to understand the fine-scale spatial heterogeneity of urban landscapes (Lee et al. 2003; Ouma and Tateishi 2008; Zhou and Troy 2008).

Remote sensing has also long been used to map and quantify urban form from a functional perspective. The most common example of such perspective is land use mapping, from the simple “urban” versus “non-urban” mapping (Hu et al. 2020; Jing et al. 2015) to more detailed within-urban land use classifications (Hu and Wang 2013; Wu et al. 2006). Other interesting, but not so widely used classification systems include examples such as the Urban Structure Types (Voltersen et al. 2014; Wurm et al. 2009; Zhan et al. 2003), HERCULES classification scheme (Cadenasso et al. 2007), Ecotopes (Geerling et al. 2009), and Local Climate Zones (Middel et al. 2014; Stewart and Oke 2012).

Cities, or urban areas more broadly, are now widely recognized as social-ecological systems (McHale et al. 2015; Pickett et al. 2011), or social-ecological-technological systems (Grimm et al. 2017). Therefore, a holistic approach that integrates the biophysical structure and social function is highly desirable to understand urban form. Here, we developed a three-level classification scheme and mapped those levels based on high spatial resolution imagery and a municipal census of building size, height, and usage using an object-based workflow. We aimed to reveal urban landscape from a comprehensive perspective and derive useful information for urban management.

Materials and methods

Study area and data

We chose Shenzhen, Guangdong Province, China (22° 26′ 59´´–22° 51′ 49´´ N, 113° 45′ 44´´–114° 37′ 21´´ E), as our study area. It is a highly developed city with a total administrative area of 1997 km2 (Fig. 1). Shenzhen is the first “Special Economic Zone” of China, established in 1978, and generating a GDP of more than 2000 billion yuan in 2019. In addition to economic development, Shenzhen has also put great effort into ecological conservation. For example, in 2005, Shenzhen created a “Basic Ecological Control Line” to restrict urban expansion, which has slowed down the loss of forests (Yu et al. 2019). The highly hybridized social and natural features in this city make it an ideal place to quantify urban form.

Fig. 1
figure 1

The spatial location of the study area

We used SPOT 6 (Satellite Pour l’ Observation de la Terre, French for “Earth observation satellite”) images from 2017 to map the hierarchical structure of the urban landscape (Fig. 1). SPOT 6 images consist of one panchromatic band, with 1.5-m spatial resolution, and four multispectral bands, namely blue, green, red, and near-infrared, with 6-m resolution. In addition, we also used the Shenzhen building census in 2015, which consists of 594,823 polygons for each building footprint, and contains the building height and the attribute of building use. The building census was obtained from the Bureau of Planning and Natural Resources.

Hierarchical classification system

We constructed a three-level classification system from a coarse to fine-scale, by considering various ways in which humans perceive and use urban landscape. Specifically, we first separated the urban ecosystem type from others such as forests and agriculture. We then mapped urban functional zones—hybrid patches that typically have a mixture of different land cover elements with built and non-built components within urban ecosystems. The different land cover elements within each patch type or zone were mapped (Fig. 2). In level 1, we separated the ecosystem types of urban, forest, wetland, grassland, and farmland. In level 2, we first classified urban function zones, namely residential, commercial, industrial, transportation, and mixed zones, within the urban ecosystem type. Any places not classified in one of the zones just mentioned were merged into a scenic zone. Within the scenic zone, we also separated transportation zones. In level 3, we differentiated eight land cover elements, namely tree canopy, grass, bare soil, water, building, road, impervious surface, and construction, within different types of zones.

Fig. 2
figure 2

The graphical example of the hierarchical classification system

Hierarchical classification methods

We used a top-down classification procedure by conducting the classification in the order of ecosystem types, function zones, and land cover elements. The classification of the lower level was based on the results of the higher level. We used an object-based methodology to carry out the three-level classification. We segmented objects of multiple sizes to match the classification scales of different levels.

We first segmented the SPOT 6 image to generate large-scale objects of the first level and then used rule-based classification and visual interpretation to classify the objects. Based on the first level, we segmented the second level by overlaying the vector layer of blocks, which was modified from the OpenStreetMap (www.openstreetmap.org), on the first level and classified the function zones in the second level using building attributes. Based on the second level, we segmented small objects of the third level and classified the land cover elements with supervised classification. During the classification, the classification results of the upper level provided information for the classification of the lower level, which referred to as a top-down feedback approach (Zhang et al. 2018).

The first level: ecosystem types

We used a multi-resolution segmentation algorithm (MRS) embedded in the eCognitionTM software to segment the SPOT 6 image. This algorithm is a bottom-up segmentation method, which consecutively merges pixels with similar spectral features to generate objects. The spectral similarity of pixels is determined by the input image layers (Baatz and Schäpe 2000). Here, we set equal weights for the five original bands of SPOT 6 to calculate spectral similarity. The multi-resolution segmentation uses a parameter “scale” to determine the size of segmented objects. In general, the higher the scale value, the larger the size of the segmented object. In addition to the parameter scale, the multi-resolution segmentation uses two pairs of parameters: color and shape, and compactness and smoothness, to adjust the shape of segmented objects.

We first resampled the spatial resolution of the original image from 1.5 to 12 m to improve the segmentation efficiency. Then, we segmented large objects corresponding to the large patches of ecosystem types (Fig. 3). We set a large value of the scale parameter (450) by visually comparing segmentation results with different values from 200 to 600. In addition, we set the weights of color and shape as 0.9 and 0.1, respectively, and weighted both compactness and smoothness as 0.5 to segment meaningful boundaries, as suggested by previous studies (Mathieu et al. 2007; Pu et al. 2011).

Fig. 3
figure 3

The segmentation result of level 1

After segmentation, we calculated the object features of normalized difference vegetation index (NDVI), normalized difference water index (NDWI), and brightness based on the original image (Table 1). Then, we categorized ecosystems using a rule-based classification. First, we classified the forest by setting an NDVI threshold larger than 0.22. Second, we classified the wetland using a 0.17 threshold of NDWI. Third, we classified urban using a threshold of brightness set to larger than 530. Finally, we manually mapped grassland and farmland and improved classification results with visual interpretation. All threshold values were determined by the “trial and error” approach.

Table 1 Object features used for classification

After classification, we further smoothed the borders of the objects by “sanding” the peninsulas, such as the long and thin roads which sprout from urban to forest ecosystems (Fig. 4), because those peninsulas were in fact belonging to the ecosystem enclosing them. Here, we used an algorithm called morphology embedded in eCognitionTM to smooth the borders. We created a circle mask with a diameter of 6 pixels to sand the object. The parts of a single object were separated if the width of those parts were smaller than the mask and then reclassified to its surrounding context.

Fig. 4
figure 4

An example of the smoothed objects by the morphology algorithm

The second level: urban function zones

Based on the ecosystems’ objects of the first level, we segmented the objects of the second level by overlaying the vector layer of the block (Fig. 5). Then, we classified urban function zones based on the classification results of the first level. First, we classified the scenic zones by merging ecosystems of forest, farmland, grassland, and wetland. Second, we extracted transportation zones in the whole Shenzhen based on the block layer. Third, we classified the urban ecosystem into residential, commercial, industrial, and mixed zones using the attribute of building usages.

Fig. 5
figure 5

The segmentation result of level 2

For each building in the census, many attributes such as height, types, and usage were labeled by field investigation included in the census. Based on the attribution of usage, we categorized buildings into four basic types: residential, commercial, industrial, and others. For each object in the second level, we calculated the percentage of different building types and then classified the function zones according to the percentage of different building types. Specifically, if the percentage of one building type surpassed 50%, then the object was classified as a zone of that building type. If the percentage of all building types were less than 50%, the object will be classified as a mixed zone.

The third level: land cover elements

Within the objects of the second level, we further segmented the third level based on the original image with a spatial resolution of 1.5 m (Fig. 6). In addition, we used the vector layer of buildings as ancillary data to segment the boundary of the building footprint. We used a multi-resolution segmentation algorithm and set the scale value as 120 to segment the relatively fine land covers. To segment objects along the boundary of land cover, we set more weights on color (0.9) and less weights on shape (0.1) and set equal weights for compactness (0.5) and smoothness (0.5), according to previous studies (Mathieu et al. 2007; Pu et al. 2011).

Fig. 6
figure 6

The segmentation result of level 3

After segmentation, we classified roads and buildings based on the ancillary data. Then, we classified six land cover elements, namely tree canopy, grass, bare soil, water, impervious surface, and construction, using supervised classification. After that, we refined the results using a knowledge-based classification, which used the information of ecosystem types and function zones to improve the classification result. Finally, we conducted manual editing to refine the results, especially for construction and bare soil.

For the supervised classification, we first chose 30 training samples for each of the six classes referring to high spatial resolution imagery of SPOT 6. Then, we chose NDVI, NDWI, brightness, canny edge, and original bands, as features for classification, which are commonly used in previous studies (Qian et al. 2014) (Table 1). Finally, we applied the classifier of the support vector machine (SVM) to classify different land cover elements. For SVM, we chose the RBF kernel, setting the parameter C to 106, and the parameter gamma to 10−5 as suggested in the previous study (Qian et al., 2014).

Subsequently, we conducted a knowledge-based classification by integrating expert knowledge to improve the classification. Specifically, we reclassified the land cover elements of the third level by considering the object characteristics of the upper levels of ecosystems and functions. For example, as previous research found that cloud shadows in the mountain areas are likely to be misclassified as water due to their spectral similarity (Amin et al. 2013; Li et al. 2013), we reclassified the water as tree canopy, if the corresponding object in the first level was classified as forest. Similarly, most water patches in the residential zones are most likely to be shadows cast by buildings (Zhao et al. 2009), so we corrected those instances of water to impervious surfaces if the corresponding object in the second level was classified as the residential zone.

Accuracy assessment

After the three-level classification, we conducted an accuracy assessment for the first and the third levels. We did not assess the accuracy of the second level, because its classifications were based on the ancillary data of roads and roofs, which are reliable results manually depicted based on field investigation of the census. For the first and third levels, we randomly selected 30 testing samples for each category based on SPOT 6 satellite imagery. Using the error matrix, we calculated the overall accuracy and the kappa coefficient.

Results

Landscape pattern on the three levels

The overall accuracies and kappa coefficient for the first level of the ecosystem were 91.3% and 0.89, respectively. The classification on level 1 showed that the ecosystem types of forest and urban were the dominant ecosystems within the administrative boundaries of Shenzhen city, amounting to 51.4% and 45.3%, respectively. Urban landscapes were located mostly in the western and northeastern parts of Shenzhen, while the forest landscapes were mainly distributed in the eastern Dapeng district of Shenzhen (Fig. 7, panel a). The proportions of the ecosystem types of wetland, farmland, and grassland were relatively small, accounting for only 2.3%, 0.6%, and 0.3% of the whole city (Fig. 7, panel a).

Fig. 7
figure 7

Landscape pattern at each of the three levels

On the second level, the scenic zone was the dominant functional type, which accounts for 53.9% of the area, and was distributed throughout Shenzhen city (Fig. 7, panel b). The residential and industrial zones were also main functional types, which have proportions of 17.7% and 15.6%, respectively. Within the urban ecosystem type, the proportions of residential and industrial zones were 38.4% and 33.8%, respectively. The residential zones were distributed all over Shenzhen city, while the industrial zones were mainly located in the north, and many of them were connected with scenic zones (Fig. 7, panel b). The proportions of other functional zones were relatively small, with the proportions of mixed transportation, and commercial zones amounting to only 7.9%, 3.7%, and 1.3%, respectively (Fig. 7, panel b).

The overall accuracies and kappa coefficient for the third level of land cover were 87.5% and 0.86, respectively. We found tree canopy was the dominant land cover element in Shenzhen, which accounted for 55.6%. Second, building and impervious surfaces comprised 16.0% and 15.6% of Shenzhen. The next most abundant land cover elements were roads and water, which accounted for 5.0% and 4.5% of Shenzhen. The proportions of other land cover elements were relatively small, ranging from 0.6 to 1.7% (Fig. 7, panel c).

Landscape pattern from a multi-level perspective

Comparing the land cover composition of different ecosystems, we found that the proportion of impervious surface (29.1%), buildings (32.2%), and roads (9.6%) was higher in urban ecosystems, while in other ecosystems, the proportion of impervious (less than 6.5%), buildings (less than 6.0%), and roads (less than 1.5%) was much lower. In addition, the proportion of tree canopy in the urban ecosystem was also high (22.6%), second only to the proportion of tree canopy (85.8%) in the forest ecosystem (Fig. 8, panel a). From the perspective of the distribution of different land covers, we found most impervious surface (84.8%), building (91.1%), road (86.1%), and construction lands (71.4%) were distributed in the urban ecosystem, while the natural land cover elements such as tree canopy (81.3%), water (83.0%), grass (57.9%), and farmland (80.5%) were mostly located within non-urban ecosystem types (Fig. 8, panel b).

Fig. 8
figure 8

Landscape pattern combining different levels

Comparing the land cover composition of different function zones, we found the mixed, residential, industrial, and commercial zones had a similar land cover composition (Fig. 8, panel c). All four zones had a large proportion of impervious surfaces, buildings, and trees. The total proportion of these three land cover elements was more than 80% (Fig. 8, panel c). Different from these four types of zone, the scenic zone was dominated by trees, which account for 81.6%, while the transportation zone had a large proportion of roads, amounting to 51.6% (Fig. 8, panel c). Combined with the distribution of different land covers (Fig. 8, panel d), we found that although the composition of the land covers in the mixed, residential, industrial, and commercial zones was similar, the areas of them were very different. Most buildings (76.0%) and impervious surfaces (62.1%) were located in residential and industrial zones, not the commercial zones (Fig. 8, panel d).

Discussion

Classification of urban systems began by distinguishing cities from the countryside (Pickett and Cadenasso 2009; Small et al. 2005) and has evolved to document the spread of urban land covers and land uses over expanding regions (Tan et al. 2010; Xiao et al. 2006). Land cover and land use (LC/LU) classifications, even with high-resolution imagery, consider large urban regions to be mosaics of distinct and sharply contrasting zones or districts. This can lead to confusion in understanding the very large administrative jurisdictions that characterize many megacities. Such large cities comprise vastly different landscapes that mix many specific habitat types. A first cut at understanding such extensive jurisdictions has been to identify and map the broadest landscapes or ecosystem types that exist within their boundaries (Fig. 3). Although functional zones can be recognized in cities, they are actually made up of specific, discrete cover types at lower hierarchical levels of the organization, though simultaneously contributing to the structure and function of the larger landscapes that exist in urban regions.

Our hierarchical classification system separates the urban system into three related levels: (1) ecosystem or landscape type, (2) functional zones, and (3) land cover elements. Land cover elements make up the functional zones, and the functional zones are distributed among contrasting landscape or aggregate ecosystem types. The different levels are connected using an object-based classification framework. Comparing to the traditional classification systems which mostly focused on one aspect of the many urban characteristics, such as urban area (Cao et al. 2009; Hu et al. 2020), urban structure types (Voltersen et al. 2014; Wurm et al. 2009), or land cover (Yu et al. 2016; Zhou et al. 2014), this new approach provides a comprehensive perspective and exposes massive “hidden” information by quantifying the pattern at and across multiple scales.

Take tree canopy as an example, previous studies were mainly interested in the percentage of tree canopy for the whole Shenzhen, which is 55.6% for Shenzhen city (Fig. 7, panel c). However, with the multi-level analysis, we found although tree canopy was the dominant land cover element in Shenzhen city, most trees (80.7%) were located in the forest ecosystem type (Fig. 8, panel b). That led to a low percentage of the tree canopy (22.6%) in the urban ecosystem type (Fig. 8, panel a), much lower than the percentage of buildings and impervious surfaces, which accounted for 32.2% and 29.1% of the urban ecosystems, respectively (Fig. 8, panel a). Similarly, most water (83.0%) were distributed in the non-urban ecosystems; the percentage of water in the urban ecosystem type was 1.7%, much lower than that in the whole city (4.5%). Interestingly, we found that most of the construction lands (71.4%) were distributed in the urban ecosystem, which indicates that there is a large proportion of internal renewal in Shenzhen.

By integrating the structures and functions, this approach can explore more classes of urban landscape according to our needs. For example, we can identify urban trees (18.7%) and non-urban trees (81.3%), which have different social-ecological processes and require different management, by combining level 1 and level 3. Within the urban ecosystem, this approach can differentiate residential areas (level 2) based on building density, building height, and tree covers (level 3), which represent different living quality. In addition, this approach can create a massive spatial relationship when combining different classes, such as an industrial area next to the forest ecosystem.

That multiscalar information has great potential for urban management. For example, by analyzing the height and proportion of buildings in residential areas, we can identify shantytowns that might need urban renewal (Fig. 9, panel a). In addition, information on the percentage of trees in the patches can help urban managers rank the priority for urban revitalization. Furthermore, this approach can not only identify the area and location of forest or wetland ecosystems that merit protection, but also evaluate their ecological risks by analyzing their spatial distance from industrial zones (Fig. 9, panel b). Finally, in comparison to previous studies that mostly classified urban areas at a fixed spatial scale (Hu et al. 2020; Voltersen et al. 2014; Zhou and Troy 2008), this approach provides a flexible way of generating customized information at city, block, and patch scales. Such flexibility can support urban planning and management at corresponding scales.

Fig. 9
figure 9

Identifying urban hotspots by analyzing the pattern across levels

This new approach has also improved the classification accuracy of urban land cover by using a top-down feedback approach (Zhang et al. 2018). That is, the classification results on the upper levels provided expert knowledge to assist the classification of the lower levels. For example, it corrected the misclassification of water as impervious surfaces in residential areas and corrected misclassification of some forest cover as water in the forest ecosystem. Previous studies often used expert knowledge of the land cover change to improve classification accuracy; for example, areas of impervious surface measured in early years of a study is unlikely to convert to water in later years (Yu et al. 2016). This new approach, however, introduced the social-ecological information in classification and enlarged the application of the knowledge-based classification.

According to different research and management needs, this hierarchical framework can also be flexibly expanded or modified, especially for the socio-ecological hybrid patches of level 2. Take urban heat island (UHI) as an example. If we aim to study urban heat island intensity, we can analyze the temperature difference between the urban ecosystem landscapes and non-urban ecosystem in level 1 (Hu et al. 2016; Peng et al. 2018). In the urban ecosystem of level 1, we can further subdivide the local climate zone (LCZ) to study the heat island within the urban area (Leconte et al. 2015; Stewart and Oke 2012). Within different LCZs, we can investigate the cooling effect of tree patches (Jiao et al. 2017; Qian et al. 2018). In addition to UHI, social-ecological patches of other types, such as urban structure types (Voltersen et al. 2014) or HERCULES (Cadenasso et al. 2007), can also be integrated into this framework.

Conclusions

This paper has taken the theoretical understanding of urban spatial heterogeneity and used it to generate a classification scheme that exploits remotely sensed imagery, infrastructural data available at a municipal level, and object-based spatial analysis. Applying the classification scheme to the megacity of Shenzhen has exposed the limitations of using data only at the scale of landscape or ecosystem type (level 1) in assessing, for example, the tree cover of the city. The hierarchical classification developed here has discovered that within the large urban landscape type, tree cover is actually less common than in city-wide data that include the four coarse landscape or ecosystem types. In other words, for effective planning and management, the hierarchical levels of landscape classification (level 1), analysis of use and cover by urban zones (level 2), and the fundamental elements of land cover (level 3), each exposes different respects relevant to city plans and management.