1 Introduction

A major part of the global population now lives in cities; consequently, cities are growing in complexity and dynamics. For example, a city’s expansion is not restricted to horizontal expansion as most of the developed cities are now growing vertically as well. In addition, new urban designs with a variety of construction materials pose unique environmental challenges. Thus, innovative urban information technologies are needed to provide a solution to problems associated with contemporary urban design and development models, especially in the era of smart cities.

Rapid development and the dynamic growth of urban areas require innovative technologies to provide a huge amount of increasing information about an urban landscape. Remote sensing (RS) is defined as the science of collecting, extracting, and analyzing information about objects, on images obtained without having physical contact with the objects. Wide spatial coverage from space or airborne remote sensors complements the information obtained from extensive field-based inventories of urban landscapes. Remote sensing has a strong potential to play a pivotal role in developing the urban informatics of evolving urban spaces.

Ever-increasing improvements in spatial (from coarse-resolution to fine-resolution image models) and spectral resolution (from a few spectral bands to more than a hundred spectral bands) of remote sensing images, along with development in cyberinfrastructure and algorithms to extract information from the images, have accelerated the urban applications of remote sensing. These applications focus on various domains of urban settings, such as urban geometric and morphological models, traffic modeling, 3D urban models, urban noise and pollution management, solid waste management, tourism, and rapid-response mapping for disaster-risk reduction, and several other environmental and socioeconomic dynamics.

Since the launch of the first Earth-observation satellite in the 1970s, a wide range of remote sensing satellites has been launched, acquiring Earth-observation data in the visible (VIS) and near-infrared (NIR) portions of the electromagnetic spectrum. All the acquired Earth-observation data require that rigorous processing and algorithms are ready for analysis, and then another set of techniques are applied to extract relevant information from images. Therefore, knowledge of the essential characteristics of remote sensing platforms and sensors, along with an understanding of the basic and advanced information extraction methods, are required to reconstruct urban models. To this aim, this chapter will focus on providing background information about the history and the latest developments in optical remote sensing, processing of remote sensing images to analyze and extract information, examples of remote sensing applications in urban or peri-urban settings, and a broad outlook on future directions and the latest developments of remote sensing-based operations in urban informatics.

2 History of Optical Remote Sensing

The term remote sensing (RS) first appeared in 1962, but its origin dates back to the employment of photography and the development of flight at the beginning of the nineteenth Century (Olsen 2016). The balloonist Gaspard Tournachon took photographs of Paris from a balloon in 1859, starting the era of RS. Then a wide range of scientists followed Tournachon’s experiment and made many improvements. For example, Germans used aerial photographs to measure features and areas in forests. The Bavarian Pigeon Corps used pigeons to take aerial photos, and Albert Maul used a rocket to take an aerial photograph. Until the 1910s, systematic RS and aerial photography were rapidly developed with the purpose of military surveillance and photoreconnaissance during World War I. A series of related technologies were also developed and reached a climax during the war. The most significant development of RS technology took place in World War II. Several imaging systems, such as photography using near-infrared and thermal infrared, aiming to differentiate real vegetation from camouflage, and airborne imaging radar that was used for nighttime bombing, were also achieved (Blaschke et al. 2011).

After the war and in the 1950s, RS systems advanced to a global scale and substantial progress in radar development was achieved. The first Earth-observation satellite, Landsat launched in 1972, began a new RS era. Various Earth-observing and weather satellites, like AVHRR, Landsat, and SPOT, provided global measurements of various data for all kinds of purposes. Attention was also paid to the development of image processing of satellite imagery and fine-resolution imagery. The first hyperspectral sensor was developed in 1986 and the first fine-resolution satellite, IKONOS, was launched in 1999 (Blaschke et al. 2011). Currently, online platforms, such as Google Earth and Google Maps, collect and store massive satellite images and make them accessible to the general public, thus accelerating the development of RS technology.

3 Latest Developments in Optical Remote Sensing

Over the past decades, extensive research and development in sensor technology have been carried out, making it possible to collect fine-resolution and hyperspectral imagery. All of the sensors have different spatial, spectral, radiometric, and temporal resolutions. The major characteristics of the well-known optical RS satellite sensors are summarized in Tables 20.1, 20.2 and 20.3. As shown in Table 20.1 and Fig. 20.1, most satellites were launched by the USA. There was a total of 791 Earth-observation and Earth-science satellites in orbit by March 2019, among which 481 were optical/multispectral/hyperspectral imaging satellites (Fig. 20.1; UCS Satellite Database 2005).

Table 20.1 Representative satellites and launching date
Table 20.2 Characteristics of representative optical satellites
Table 20.3 Primary applications of representative optical satellites
Fig. 20.1
figure 1

Earth-observation satellites in orbit by March 2019

3.1 Introduction to Representative Optical Satellite Sensors

A variety of optical RS satellites have been launched for Earth-observation applications. A brief description of representative sensors is given in this section.

Since 1972, there have been eight Landsat satellites launched, with Landsat 9 planned to be launched in 2021. Landsat 5 was the longest operating Earth-observation satellite, continually collecting data for 28 years from its launch in March 1984 until it was decommissioned in January 2013. Imagery from the series of Landsat satellites has been archived in the US and at Landsat receiving stations around the world, providing unique resources for global-change research and applications in agriculture, cartography, geology, forestry, regional planning, surveillance, and education; and the data can be accessed through the United States Geological Survey (USGS) EarthExplorer website.

SPOT (Satellite Pour l‘Observation de la Terre) is a part of the RS program set up in 1978 by France in collaboration with Belgium and Sweden. Each SPOT is comprised of two identical fine-resolution optical imaging instruments that can be operated in either panchromatic or multispectral mode. It has been designed to explore the Earth’s resources, detect and forecast phenomena involving climatology and oceanography, and monitor human activities and natural phenomena.

ASTER (the Advanced Spaceborne Thermal Emission and Reflectance Radiometer) consists of three subsystems: Visible and Near-Infrared (VNIR), Shortwave Infrared (SWIR), and Thermal Infrared (TIR). ASTER data are often used to derive maps of land surface temperature, reflectance, and elevation. It also has many applications, including monitoring vegetation, hazards, geology, land surface, hydrology, and land-cover change.

IKONOS is the first civilian fine-resolution sensor, providing images with a comparable resolution to aerial photos. It is useful for applications such as urban geography, land-use, agriculture, and natural-disaster management due to its fine-resolution. Quickbird was launched in 2001 and decommissioned in 2015. It has very-fine-resolution sensors that can acquire images in panchromatic and multispectral modes concurrently. It is designed to support applications such as map publishing, land and asset management, and risk assessment. WorldView consists of very-fine-resolution satellites with a short average revisit time. WorldView-1, launched in 2007 and still operating today, is only capable of collecting panchromatic imagery but having the finest resolution of 0.41 meters. WorldView-2, launched in 2009 and still in operation, has the capabilities to capture eight spectral bands. WorldView-3 was launched in 2014 with fine-resolution imagery captured in sixteen multispectral bands. WorldView-4, launched in 2016, is a multispectral, fine-resolution commercial satellite with four multispectral bands and a panchromatic band.

The Indian Remote Sensing (IRS) satellite series was launched to technically support the development of agriculture, water resources, forest and ecology, geology, water-conservancy facilities, fisheries, and coastline management in India. Gravity Recovery and Climate Experiment (GRACE), a collaboration between National Aeronautics and Space Administration (NASA) and the German Aerospace Center, is a satellite mission that monitors Earth’s gravitational field. Scientists can infer changes in groundwater by measuring the changes in the gravitational field. The summary of primary applications of different satellites is shown in Table 20.3.

In recent years, with the development of commercial images and the launch of satellite-based sensors, hyperspectral imaging is becoming the mainstream in the RS field. And the rapid development of artificial intelligence may provide a new era of applications for RS in the future.

4 Processing of Remote Sensing Satellite Images

Not all the acquired RS images are ready to use, because there are many distortions or deviations in raw images. The distortions can be divided into random distortions (Fig. 20.2) and systematic distortions. Random distortions can be caused by changes in altitude, attitude, and speed of the sensor platform, atmospheric refraction, or relief displacement, while systematic distortions are caused by panoramic distortion, skew distortion (Fig. 20.3), and the Earth’s curvature. Before we use RS images, it is important to correct these errors.

Fig. 20.2
figure 2

A graphical illustration of random distortion

Fig. 20.3
figure 3

A graphical illustration of skew distortion

Generally, satellite image processing operations can be divided into three stages: (i) image pre-processing, (ii) image processing, and (iii) image post-processing. Image pre-processing aims to correct distortion and to reduce noise in the data. The purpose of image processing is to understand the information stored in remotely sensed images and to optimize the appearance for the visual system by using or not using enhancement technology, so the operation involves filtering, and band ratio or contrast enhancement to enhance or mask image features or classify images. The objective of post-processing is to further reduce the errors of image processing based on expert knowledge and ancillary information.

4.1 Image Pre-processing

Primary image pre-processing procedures include image rectification, also known as a geometric correction, and radiometric correction which deals with atmospheric error correction and conversion from digital number (DN) to radiance. The process of rectification is to correct distortions, including image-to-image registration and image-to-map registration (Fig. 20.4). In this process, the coordinates in an image match the selected points in a map or an image to derive geometric transformation coefficients; then these coefficients may be used to rectify the image geometrically. The root-mean-square error (RMSE) is used to assess the correction accuracy. The closer the value is to zero, the smaller the residuals, representing a more accurate correction. The procedure of radiometric correction includes atmospheric correction and DN-to-radiance conversion. It is used to calibrate the system and reduce the systematic calibration effect and atmospheric effect. The particles in the atmosphere can cause scattering and absorption depending upon the physical and chemical characteristics of the atmospheric particles. Atmospheric correction can be conducted through an empirical method using empirical line calibration, which forces the RS image data to match the in situ spectral reflectance measurements, and through the dark pixel method, which finds the minimum pixel value from each band using histograms, and subtracts that value from all of the pixels in the band.

Fig. 20.4
figure 4

A typical example of geometric correction of a satellite image; a raw image and b geometrically corrected image

The pre-processing procedures produce consistent images with high scientific quality that can be directly used for scientific applications and subsequent analysis.

4.2 Image Processing

Satellite image processing includes: (i) masking or clipping area of interest (AOI), (ii) contrast enhancement, (iii) spatial filtering, (iv) spectral enhancement, (v) image classification, and (vi) object recognition and extraction.

Masking of a study area or area of interest is the foremost processing step in which an image (or mosaic of images) is clipped over a region of interest. The clipping helps to reduce the size of the image and processing time as well as to focus on the desired study area or region of interest.

Contrast enhancement is used to transform satellite images for visual enhancement by stretching the input values to the maximum available range. The contrast enhancement procedures can be applied on the entire image for a better contrast among different land-cover or land-use types, or it can be used to enhance specific features in an image to emphasize a specific land-cover or land-use type (e.g., vegetation, soil, water, or snow) by diminishing others. Sometimes image displays may not clearly show all the features, especially when dealing with monochromes. This is where contrast enhancement comes in. Contrast enhancement is done through spectral feature manipulation. It can maximize the contrast between the features according to the image histogram. The most common method is a linear stretch (Fig. 20.5).

Fig. 20.5
figure 5

Contrast enhancement of an image: a original image and b linearly stretched image

Spatial filtering is a process to emphasize or de-emphasize various spatial frequencies in the image data or tonal variations in an image. An example of spatial enhancement (filtering) is shown in Fig. 20.6. Filtering makes use of kernels, a square matrix that is moved pixel by pixel and is designed to increase the brightness of the central pixel, depicted as a single positive value surrounded by negative values. The larger the kernel, the more blurred the pixels. A low-pass filter emphasizes low-frequency changes in the brightness and de-emphasizes or smooths local details such as by taking the mean, while high pass filters de-emphasize more general low-frequency details and emphasize the high-frequency components by exaggerating local contrast. Filters can also be used for edge preservation and noise removal. For example, the median filter is better at preserving edges on an image, and a model smoothing filter can remove the “salt and pepper” effect on a classified image, leaving a more homogeneous output.

Fig. 20.6
figure 6

An example of spatial enhancement (filtering): original image (a) and filtered image (b)

Spectral enhancement comprises image transformation processes used to extract unique spectral information, combine the information in different spectral bands, and compress information from multiple wavebands into fewer bands.

Once the data have been processed, it is then up to the operator to analyze what is captured in an image. In order to interpret an image, the operator first has to detect, identify, and classify the object. Normally, classification methods mainly follow two approaches: unsupervised classification and supervised classification. The unsupervised approach clusters pixels based on spectral statistics, without sampling and training, while the supervised approach employs classifiers based on the results of sampling and training land-cover classes, and users need to define useful information about categories and examine the spectral separability before classification.

The information in a satellite image can be extracted and classified at various processing units of the image; for example, pixel level, a unit defined by the image spatial resolution; sub-pixel level, a pixel is spectrally unmixed to identify a portion of a land-cover feature in the pixel; and object-based classification, which is based on the concept of grouping homogeneous pixels and primarily applied on a very-fine-resolution image where an object is divided and stored into many pixels. Generally, sub-pixel and object level (object-based) classification routines are implemented for information extraction over urban areas. For example, a linear spectral unmixing model was applied to an IKONOS (4 m spatial resolution) image to estimate the contribution of trees and grasses in the urban landscape of Hong Kong (Nichol and Wong 2007).

Supervised techniques rely on user-defined training sites describing the nature and number of possible land-cover classes (Mather 2011). The most significant and conventional decision rules of supervised classification include maximum likelihood decision rule, nearest neighbor decision rule, and parallelepiped decision rule.

The unsupervised approach is optimal when there is no enough prior ground truth information about the area of interest (Mather 2011). According to analyst-defined parameters, unknown image pixels are iteratively clustered until either the proportion of pixel class values remains unchanged or a maximum number of iterations is reached (Jensen 2009). The three most commonly used clustering algorithms are: k-means clustering, fuzzy c-means (or modified k-means), and ISODATA (iterative self-organizing data analysis technique).

In 1999, with the launch of IKONOS (Goetz et al. 2003), intra-class spectral variations and inter-class spectral confusion had increased in fine-resolution satellite imagery. Due to higher pixel-to-pixel variability and information contained in patch-based landscape structures, classical approaches of image analysis are becoming out of date. The recently developed object-based image analysis techniques of pattern recognition overcome these difficulties by first segmenting the image into multi-pixel image object primitives according to both spatial and spectral features of groups of pixels.

Over the past decade, there has been a noticeable shift in the analysis of Earth-observation (EO) data, from what has been predominantly 30 years of per-pixel multispectral-based approaches, towards the development and application of multi-scale object-based analysis. New concepts of object-based analysis, such as the fractal net evolution approach (FNEA), linear scale-space and blob-feature detection (SS), and multi-scale object-specific segmentation (MOSS) were developed for information extraction from RS data stored in the form of digital images (Mallinis et al. 2008).

In addition, a wide range of advanced classification approaches has been developed in recent years to solve a variety of problems arising with fine-resolution data sets and complex urban environments. The new methods and approaches from machine learning and pattern recognition include artificial neural networks (ANN), deep learning methods, decision trees, support vector machines, extreme learning machines, an artificial immune system, active learning, semi-supervised learning, binary tree support vector machine, and random forest. Other modern techniques also include ensemble learning based on multiple learners, spatial-spectral classification, multi-kernel support vector machine, wavelet analysis, phenology-based classification, kernel k-means, and expectation-maximization (Xue et al. 2015; Du et al. 2012; Fernandez-Delgado et al. 2014; Lu and Weng 2007; Mountrakis et al. 2011; Tan and Du 2011).

Combining multiple RS data sets, advanced urban feature extraction algorithms, and accurate classification algorithms, an urban information system has been developed to effectively monitor the rapidly evolving urban areas and their impact on the environment (Kadhim et al. 2016). Recent urban applications of RS comprise urban green spaces mapping, aerosol monitoring, urban heat island effect, automatic feature extraction (e.g., roads, buildings, and trees), relationships between land-use and surface temperature, 3-dimensional geometric models for urban heat island, urban energy-efficiency models, and mapping migrant housing in mega-urban centers (Blaschke et al. 2011; Hamdi 2010; Jin et al. 2011; Hofmann et al. 2011; Miyazaki et al. 2011; Hermosilla et al. 2011; Rinner and Hussain 2011; Hay et al. 2011; Geiß et al. 2011; Liu and Zhang 2011; d’Oleire-Oltmanns et al. 2011). Also, some modern urban RS methods are focusing on integrating multiple RS (night light imagery and multispectral indices) and geolocation datasets using machine learning approaches for urban informatics application of RS (Xia et al. 2019).

In the past couple of decades, with the advent of very-fine-resolution remote sensing images (1 m or less), there has been a major shift in information extraction from conventional pixel-based classification towards object-based classification and target-object extraction over urban areas. Modern techniques of machine learning focus on extracting typical urban features such as roads, buildings (more specific characteristics of buildings), cars, and urban trees, rather than classifying whole images or mapping urban sprawl.

4.3 Image Post-Processing

After determining the classes of image objects, image post-processing procedures usually include map production, raster to vector conversion, and image interpretation. The information on images needs to be converted to land-cover classes. Applying a majority filter to remove salt and pepper in pixel-based land-cover maps is the most commonly applied post-classification process. In urban areas, expert knowledge and ancillary information, such as population density, may be required to distinguish between spectrally similar high-density residential areas and commercial buildings. Current technologies have some automated procedures, enabling automated detection and identification, but ultimately it would be left up to the operator to interpret the results.

5 Applications of Optical Remote Sensing

Recent advanced technologies have improved what we can do in RS. Since 1995, RS is no longer restricted to military and government use. And rapidly developing technologies also allowed for the expansion of applications, such as urban and population growth, town planning, weather forecasting, crop prediction, and forecasting, forest and rangeland monitoring, air-quality monitoring and assessment, and surface-material detection, just to name a few. Infrared cameras become commercially available, which can be used to detect the health condition of vegetation, and hand-held devices can be carried on helicopters to record heat signatures and to monitor the urban heat island effect.

For coastal water-quality monitoring, RS data sets which combine a synoptic viewpoint with the ability to measure the reflected energy from the water surface in different spectral regions, are increasingly available for coastal water-quality applications. For example, improved estimation of chlorophyll-a concentrations for the coastal area of Hong Kong has helped in the detection of algal blooms, including their intensity and extent. For vegetation monitoring, aerial photographs and fine-resolution satellite images can be used for mapping secondary vegetation succession. When dealing with the mapping of deforestation and degradation, medium-resolution Landsat satellite images can provide satisfactory results, while coarse-resolution satellite images are required when monitoring the impact of drought on vegetation moisture conditions, using photos captured by MODIS. Research on atmospheric aerosols using satellite RS is popular. Aerosols are suspended particles in the atmosphere emitted from natural and anthropogenic sources. These particles are responsible for climate change, poor air quality, and atmospheric visibility, and also associated with public health. Satellite RS is an effective and unique technique for retrieving spatial aerosol optical thickness over the globe. Different satellite sensors such as MISR, MODIS, and Visible Infrared Imaging Radiometer Suite (VIIRS) can retrieve aerosol optical thickness.

5.1 Land-Use and Land-Cover Mapping

Land-cover refers to the features on the Earth’s surface, and land-use indicates the human activities on the particular land parcel (Lillesand et al. 2008). Detailed land-cover mapping can be utilized in urban planning, land-use monitoring, change-detection analysis, and policymaking. With the development of RS technology, satellite images achieve a good visual performance and are brought into more practical applications at local or territory-wide scales, such as for urban land-use classification (Lu and Weng 2009; Pacifici et al. 2009), environmental monitoring (Knight et al. 2013), and land-cover change detection (Potapov et al. 2017).

5.1.1 Multi-scale Object-Oriented Segmentation and Classification Method (MOOSC)

In order to improve land-use land-cover (LULC) mapping effectively and efficiently, a study of the multi-scale object-oriented segmentation and classification method (MOOSC) was developed (Nichol and Wong 2008). This method was implemented for habitat mapping to study a mountainous and ecologically diverse area of Tai Mo Shan and Shing Mun Country Parks in Hong Kong using fine-resolution IKONOS satellite images. The method started with grouping homogeneous pixels into image objects or segments at their respective scales. Then a five-level decision tree classification was constructed to classify each feature or object. Apart from the four native multispectral bands of the IKONOS images, additional layers of NDVI (Normalized Difference Vegetation Index), chlorophyll index, digital elevation model (DEM), and three texture bands were used in segmentation and classification procedures. The minimum mapping unit (MMU) of the classification map was about 150 m2.

This study provides appropriate and optimal results to substitute the traditional methods of mapping using aerial photographs. The major merits of this method are: (i) the potential to produce more accurate results than traditional classification due to its wide range of parameters such as spectral information, texture, shape, and size; (ii) object-based classifications use a segmentation process to identify and delineate meaningful targets on images (it is important that the segmentation process is an automated digitizing method for delineating the target boundaries; the availability of classification outcomes in vector format is considerable merit of an object-based approach as compared with raster-based maps using conventional classification methods); and (iii) the developed object-based classification method is cost-effective since it can achieve accuracy comparable to the manual interpretation of aerial photographs but at only one-third of the cost.

5.1.2 Hybrid Object and Pixel-Based Classification (HOPC)

The object-based classification works well in homogeneous areas with similar spectral signatures, while pixel-based classification works on heterogeneous or fuzzy areas. Neither of them can be applied alone on broad land-cover classification especially over vegetation areas. A new approach, hybrid-MOOSC, has been developed by integrating multi-scale object-based segmentation, decision tree classification, and pixel-based classification technologies to classify heterogeneous natural landscapes of Hong Kong from fine-resolution satellite images. The approach combines SPOT-6 multispectral images, a fine-resolution DEM, and a digital surface model (DSM). The rationale of this hybrid-MOOSC is to utilize an object-based approach over homogeneous areas and a pixel-based approach over fuzzy or uncertain areas. The individual accuracy of habitat classification of mixed classes such as isolated trees and shrubs in open grassland has been significantly improved using the approach. The classification results derived from hybrid-MOOSC, as shown in Fig. 20.7, can be fully utilized in urban planning, land-use monitoring, and change-detection analysis in local and territory-wide classification with a promising potential to classify urban areas from very-fine- and fine-resolution satellite images.

Fig. 20.7
figure 7

Land-cover map of the entire territory of Hong Kong using hybrid-MOOSC

Multi-resolution segmentation was applied to create objects with coherent spectral characteristics. It is a process during which pixels with similar spectral characteristics are merged into an image object. Then, classification is conducted on the image objects by assigning them to specific land-cover types. Ideally, an image object comprises only one class, but any resolution of satellite image does not void the availability of similar spectral values from mixed-class objects. Therefore, this study used a rule-based separation of pure objects and fuzzy objects (decision rules for each class). The thresholds were defined by analyzing the sampling histograms of various features (such as NDVI, blue-red ratio, red ratio, and object height) of image objects corresponding to each land-cover class. Most of the image objects were correctly classified into corresponding classes, which correspond to the homogeneous classes. However, some image objects cannot be classified efficiently due to overlapping in their feature properties, such as spectral response, resulting in fuzzy areas.

A fuzzy object contains two or more classes at a certain spatial scale. For example, an object may contain both grassland and open shrubs which cannot be separated into two objects in the multi-resolution segmentation stage. In these fuzzy objects, their feature properties are averaged over classes that are not distinctive from pure classes, as their feature properties usually overlap in the sampling histograms. Therefore, for fuzzy objects, refinement is needed in order to achieve a more accurate classification result. For this purpose, a pixel-based segmentation was performed on the fuzzy objects, which is a method of dividing large objects into smaller pixels. When the objects are broken down into pixels, they will be reclassified into their corresponding classes. The advantage of the object-based approach is to alleviate the original noise, while the pixel-wise method is good at preserving the details of ground objects, especially in fuzzy areas which are transition stages of habitat classes in a landscape. The proposed HOPC is useful for improving the classification of a fine-resolution image by combining both approaches.

The high accuracy of the HOPC result may be mainly due to its hybrid approach which combines the advantages of object-based classification and pixel-based classification, with flexible expert judgment. The object-based fuzzy areas were further broken down into pixels and reclassified to the corresponding class. This advanced method helped to increase the overall accuracy significantly. However, if only pixel-based classification is adopted, for example, MLC, it does not consider in an object aspect, so that many homogeneous areas contain inconsistent classes after classification, such as the salt and pepper effect. For object-based classification, homogeneous objects can be segmented first and then classified, but this does not deal with the borders of the objects, which usually introduces fuzzy areas.

5.2 Urban Vegetation Phenology

Vegetation phenology is the timing of seasonal developmental stages in plant life cycles. It has been gaining considerable attention due to its implications for water, carbon, and energy cycles, and even human health. Vegetation phenology is sensitive to environmental conditions. As we know, urbanization can change environmental conditions (e.g., alter the local climate and bring more artificial light), and thus affect vegetation phenology. Studying urbanization-induced vegetation phenology shifts will provide insights on how vegetation responds to environmental changes. Considering that urbanization is accelerating around the world, addressing this question will further help to investigate future ecosystem scenarios under the pressure arising from global climate change and growing population.

Several studies have used RS data to investigate the urbanization effects on vegetation spring phenology in different cities (Li et al. 2017). These investigations have reached the same conclusion, that vegetation spring phenology in urban areas occurs earlier than in surrounding rural areas.

However, the magnitude of this rural-urban difference is quite different among these studies. Yao et al. applied 2001–2015 MODIS EVI data to study phenology change in all cities of northeast China and revealed that the spring phenology in urbanized areas advanced 0.79 days/year more than in rural areas in this period (Yao et al. 2017). Li et al. used 2003–2012 MODIS EVI data to study phenology change in more than 4500 urban clusters in the conterminous United States (Li et al. 2017). They found that phenology changes are related to urban area size. A tenfold increase in the size of a city could lead to earlier spring phenology of about 1.3 days. More studies are needed to explore the reasons for these diverse urban effects on vegetation phenology.

5.2.1 Urban Vegetation Phenology of Beijing

A study was conducted to implement phenology-based vegetation monitoring methods in Beijing city (i) to explore the spatial pattern of vegetation phenology along the urban–rural gradient; and (ii) to examine the relationship between vegetation phenology and urban environmental factors including both air temperature and artificial light (Yao et al. 2017). The data used in this study included MODIS EVI time series in 2012 (MOD13Q1 Version 6, 16-day composite, 250 meters), the hourly air temperature in 2012 from 232 meteorological stations in Beijing, and nighttime light data from the VIIRS in 2012.

The method proposed by Piao et al. was used to detect the start of the season (SOS) and end of the season (EOS) from the EVI time series (Piao et al. 2006). This method first computes a reference EVI curve by averaging multi-year EVI curves and then finds SOS (when 20% of the seasonal amplitude is reached during the green-up period) and EOS (when 60% of the seasonal amplitude is reached during the brown-down period) in the reference EVI curve. Next, the EVI values in the reference curve corresponding to SOS and EOS are selected as thresholds. Then, an EVI curve in each year is fitted by a polynomial function. Finally, the SOS and EOS of each year can be detected from the fitted curve and the thresholds.

The result for SOS (Fig. 20.8a) shows a spatial distribution of green-up onset in 2012, from which we can see the onset dates of vegetation green-up in the urban area occurred earlier than the surroundings. The spatial distribution of EOS (Fig. 20.8b) shows that the onset date of vegetation dormancy in urban areas is generally later than the surroundings, especially in the rural area. Besides, both SOS and EOS in the urban expansion area distribute intricately, indicating that the vegetation in the urbanization area is heterogeneous.

Fig. 20.8
figure 8

SOS (a) and EOS (b) of Beijing detected from MODIS EVI time series in 2012

The correlation analysis between air temperature and phenology shows that SOS is negatively correlated to spring air temperature (R = −0.23, p-value <0.01) while EOS is positively correlated with autumn air temperature (R = 0.16, p-value <0.1). SOS is negatively correlated to nighttime light intensity (R = −0.22, p-value <0.01), while EOS has no significant correlation with nighttime lights. Above results suggest that both urban heat island and artificial lights may have impacts on the vegetation growth in the urban environment, and this effect is more significant in urban centers and decreases toward rural areas.

5.3 Urban Heat Island Mapping

Urban heat island (UHI) refers to the phenomenon that air and surface temperatures in an urban area are higher than those in rural areas. This temperature difference can range from 1.5 to 4 °C in summer daytime to 2–6.5 °C in winter daytime. However, a more significant UHI effect is expected at night and in the early morning. The main causes of UHI include (i) compact urban structure such as high-rise buildings with high-density; and (ii) anthropogenic heat released by human activities, for example from transportation and electricity. Then, heat will be released and trapped, resulting in a higher temperature in urban areas (for a discussion of the computational issues of UHI, see Chap. 41).

5.3.1 New Emissivity and Land Surface Temperature Retrieval Method

Hong Kong as a city suffers from the UHI effect due to high-rise buildings and high building density. Therefore, UHI monitoring is significantly required and studies have been conducted to improve UHI modeling by developing different sets of algorithms to enhance the retrieval of heat-relating parameters.

Emissivity, accounting for the percentage of radiation emitted from a surface, is a crucial parameter in retrieving land surface temperature (LST) and hence accurate retrieval of emissivity is needed. Yang et al. (2015) proposed a method estimating the effective emissivity using a sky-view factor. This factor represents the portion of the sky that can be seen from the ground and is derived from airborne LiDAR data, land-cover classification data, and building data. This study shows that there exists a high correlation between effective emissivity and the sky-view factor, attaining a correlation coefficient of more than 0.90. By additionally considering scattering, that is, the reflection effect of adjacent pixels, the refined model, named the urban emissivity model based on the sky-view factor (UEM-SVF), was developed to estimate effective emissivity in an accurate manner.  Figure 20.9 shows the validation results of the emissivity derived from the UEM-SVF model and ASTER satellite images.

Fig. 20.9
figure 9

Validation of effective emissivity derived from the UEM-SVF model

In addition to the sky-view factor, more urban geometry factors were included to improve emissivity retrieval, resulting in an improved urban emissivity model based on the sky-view factor (IUEM-SVF) (Yang et al. 2015). The new geometrical consideration factors include (i) facet emission within an instantaneous field of view (IFOV); (ii) reflection of facet emission due to adjacent facets; and (iii) scattering of emitted and reflected radiation in 3D space. Temperatures of urban facets in 3-D (TUF-3D), a microscale radiative transfer code using an energy-balance model, was employed to assess the accuracy of IUEM-SVF. Results suggested that the inclusion of geometrical considerations could improve the retrieval accuracy of effective emissivity by showing a good agreement between IUEM-SVF and TUF-3D. However, when there is more variance in emissivity, the retrieval accuracy of effective emissivity decreases.

With an accurate determination of effective emissivity, the results could then be used in several applications such as LST retrieval. Yang et al. (2016) applied the effective emissivity derived from IUEM-SVF to obtain LST for a nighttime ASTER satellite image.

5.3.2 Anthropogenic Heat Flux Modeling

Anthropogenic heat modeling is another important area in understanding, UHI since it is one of the major causes in a city. Wong et al. (2015) developed a novel algorithm retrieving anthropogenic heat using satellite images over Hong Kong with consideration of the complex land-cover in Hong Kong. The algorithm is based on the conventional energy-balance model with modification based on the heterogeneous characteristics of land-cover. The anthropogenic heat flux derived over Hong Kong on October 11, 2012, is illustrated in Fig. 20.10, and the anthropogenic heat was found to be correlated to building height and building density (Fig. 20.11 and 20.12). In urban areas, results showed that commercial areas emit the most anthropogenic heat flux, followed by industrial areas (Fig. 20.13).

Fig. 20.10
figure 10

Anthropogenic heat flux over Hong Kong on October 11, 2012

Fig. 20.11
figure 11

Relationship between anthropogenic heat and building height

Fig. 20.12
figure 12

Relationship between anthropogenic heat and building density

Fig. 20.13
figure 13

Comparison of anthropogenic heat with different land-use types

With the modeling of anthropogenic heat flux over entire Hong Kong using satellite images, firstly, the general pattern of anthropogenic heat can be extracted; secondly, different relationships between anthropogenic heat and urban geometry and characteristics can be investigated. These findings can improve our understanding of the formation, distribution, and magnitude of UHI and can assist different experts in their decision-making about mitigating the UHI effect.

5.4 Rock Outcrops Identification

Rock outcrops are part of the bedrock that is completely exposed on the surface of terrain, and they are strongly related to geologic hazards, such as landslides and rockfall. The exposed rock surface is subject to chemical and physical weathering, which increases the risk of landslides or rock falls. In high-density cities, a high density of buildings and infrastructure developed on steep slopes become a concern towards the stability of urban infrastructure and city development (Owen and Shaw 2007). The traditional ways to map the rock outcrops include field measurement and aerial photo interpretation (API). Field measurement can be conducted using the following approaches: (i) a structural geologist carrying a GPS tracker to locate the exposed segments; (ii) identification of angle and direction based on clinometer and geologic compass; (iii) identification of the geological faults and rock types of each exposed segment based on mineral characteristics, fossils, and geological ages. However, there are several limitations of the field measurement, including the accessibility of rock outcrops and the time-consuming work of mapping. To tackle these problems, API has been used for mapping rock outcrops. The advantage of using API is that it can locate the rock outcrops in areas that are inaccessible by fieldworkers. With the extensive coverage of a flight plan, it is able to cover a larger spatial extent that can be used for mapping rock outcrops of an entire city, such as Hong Kong. The major issue of using the API method is that it is time-consuming since rock outcrops are identified based on a knowledge-based process (Outcalt and Benedict 1965). It is essential in the process of identifying rock outcrops because the classification is mainly based on the differentiation of colors, tones, shape, and association (Outcalt and Benedict 1965). Based on human interpretation, there can be a high rate of misclassification.

Fig. 20.14
figure 14

Examples of Rock outcrops

5.4.1 Deep Learning Method to Identify Rock Outcrops in Hong Kong

In order to reduce the potential bias from a pixel-based RS application, object-based techniques have been developed. An innovative methodology combining the deep learning technique of convolutional neural networks and RS techniques was developed to leverage the balance between spatial resolution and spectral resolution for mapping rock outcrops in Hong Kong.

Five target land-cover types were selected as the training and testing samples in this study, including rock outcrops, grassland, tree, badland, and urban. The examples of rock outcrops are shown in Fig. 20.14. They were trained with a 16-layers VGGNet (Simonyan and Zisserman 2014) with a pre-trained model from ImageNet. Training accuracy increases significantly from the first epoch of around 50% to the third epoch of 80% and increases steadily until the end of the training. While the testing accuracy increases from the first epoch of 70% to the 20th epoch of 90% and then remains oscillating between 90 and 92% until the end of the training, it indicates that there is no more improvement in the testing accuracy after the 20th epochs. Therefore, the trained network can provide high accuracy for land-cover classification of over 90% accuracy on both training set and testing set.

After training the model, the trained network was applied to the whole selected digital orthophoto (DOP) of the whole of the Hong Kong territory. For each of the DOPs, a 20 × 20 m kernel was input into the CNN network for classification and the probability of that kernel belonging to rock outcrops was predicted. The land-cover classification map (Fig. 20.15) and rock outcrops probability map (Fig. 20.16) were then generated, and finally, the rock outcrops map of Hong Kong (Fig. 20.17) was produced.

Fig. 20.15
figure 15

Classification result of High West, Hong Kong Island with the year 2015 DOP

Fig. 20.16
figure 16

Probability map of rock outcrops of High West, Hong Kong Island with the year 2015 DOP

Fig. 20.17
figure 17

A rock outcrops map of Hong Kong

6 Summary

Presently, the development of smart cities is highly dependent on spatial information derived from remote sensing technologies. However, prior to using modern tools and techniques, knowledge about the characteristics of remote sensing datasets, interpretation theories, automatic extraction of urban objects, and problems associated with these methods is essential. It has been thoroughly discussed in this chapter. With the advent of very-fine-resolution images, contemporary research is focused towards information extraction using big data analytics, due to the huge volume of data with finer and finer spatial, spectral, and temporal resolution. In addition, analysis paradigms are shifting towards a high precision of geometric details and vertical developments; a trade-off between spectral and spatial information of the remote sensing datasets; the automatic object-oriented feature extraction to update changes in urban space; the development of urban spectral libraries from image spectroscopy to detect and classify numerous urban surface materials; cutting-edge technologies for 3D building generation from LiDAR point clouds; land-use type classification along the vertical surfaces of skyscrapers; dynamics of urban sprawl and population migration as a result of economic developments; population estimation from satellite images; sustainable urban ecology in the context of future development; disaster-risk reduction in the context of extreme weather events and earthquakes, urban noise pollution and air-pollution monitoring; urban trees and biodiversity for environmental conservation; and smart transportation systems. Thus, the enormous amount of remote sensing data and big data analytics will be the backbone of mandatory geospatial cyberinfrastructure for the development of future smart cities.