1 Introduction

Core computer vision tasks, such as navigation and obstacle detection are complicated in the presence of shadows, which can be misinterpreted as objects or obstacles. Several methods have been proposed to address shadow detection and removal in various application domains, including satellite imaging, traffic monitoring, and obstacle detection in navigation systems [4, 28, 52, 57, 59]. Figure 1 shows example images illustrating objects in a shading environment.

Fig. 1
figure 1

Example images illustrating objects in shadowed environments (a-d) from various publicly available shadow datasets considered in this study [4, 16, 17]

The shadow problem is commonly treated using the Shadow Model Theory (SMT) proposed by Barrow et al. [6], which enables the calculation of the intensity of light, reflected at a given point on a surface. It is a physics-based model, which decomposes illumination in two components: direct and ambient illumination. According to Guo et al. [23], the illumination model for an RGB image is provided by:

$${I}_i=\left({t}_i\mathit{\cos}{\theta}_i\ {L}_d+{L}_e\right){R}_i$$
(1)

where Ii is the color intensity of pixel i in R, G and B color channels, Ld and Le are the light intensities associated with the light source and ambient sources respectively, θi is the angle defined by direct lighting direction and surface, and ti ∈[0, 1] is a variable which denotes the amount of light inside an area.

Early shadow removal methods, such as [7, 17,18,19, 36, 41], were also based on models relying on a number of assumptions, and usually evaluated on datasets with a small number of images, focusing mainly on qualitative aspects. Later, there was a decisive shift towards supervised methods, such as deep neural networks, which have brought new trends, but also the requirement for large training sets. New benchmark datasets were created, to cover this requirement, and evaluations begun to encompass also a more quantitative aspect [33, 46, 50, 53]. A drawback was that in these datasets shadows needed to be manually annotated, and such annotations are time-consuming and usually costly. Furthermore, deep neural networks are usually computationally expensive and demanding in terms of computational resources.

The state-of-the-art shadow detection and removal approaches can be divided into two categories: a) unsupervised methods, and b) supervised methods. Most methods of both categories encompass the SMT and use a coefficient to indicate the intensity in the shadow areas based on Eq. (1). This coefficient describes the brightness reduction in relation to the area of the image without any shadows [15, 35, 47] . Finding the right parameters that, when multiplied by the intensity of each pixel in the shadow zone, the initial illumination will be recovered in the shadowed areas. Unsupervised methods are usually based on intrinsic image features, such as color and texture, and strategies that enable the recovery of detail and luminosity of shadow regions [5, 12, 16,17,18,19, 22, 25, 29, 31, 36, 41, 60, 63]. Supervised methods, are mainly based on complex deep learning architectures, and they usually provide results of higher quality than the unsupervised ones [1, 9, 14, 26, 27, 33, 34, 37, 42, 53, 56, 64].

This work aims to address the need for both efficient and effective shadow removal in computer vision workflows, by the following contributions:

  • A novel, fully unsupervised shadow removal method, named Simple Unsupervised Shadow Removal (SUShe), which is based solely on color image features. Unlike previous methods it is very simple, both in terms of implementation and computational complexity, whereas it obtains results that are comparable to state-of-the-art deep learning-based methods.

  • A unique shadow segmentation approach efficiently combining a physics-inspired optimization algorithm, superpixel segmentation, and histogram matching, in the context of a lightweight pre-processing function.

  • An extensive experimental study on various publicly available benchmark datasets, highlighting the tradeoff of efficiency and effectiveness it offers.

The remainder of this paper is organized into six sections. Section 2 provides an overview of related work, and Section 3 details the proposed methodology. Section 4 provides information on the experimental setup and the evaluation framework. Section 5 presents results of the proposed method in comparison with the most relevant state-of-the-art methods. Section 6 provides a perspective in terms of computational complexity. Section 7 provides a discussion of the experimental results, and the main conclusions of this work as well.

2 Related work

Several shadow detection and removal methods have been based on SMT. This theory is the foundation for most relighting methods that have been published in the last decade. Still, it is incomplete, in the sense that it fails to accurately model umbra regions, to enable correct relighting in the proximity of shadow borders. Apart from the SMT, several works have been proposed for shadow removal. These include model-based unsupervised methods, such as the method proposed in [19], where each RGB image is projected to an 1D invariant direction, in order to recover hues by means of a 2D chromaticity feature space. In [47], a pyramid-based process is employed for shadow removal with user assistance. In [36], the main aim was shadow removal in a way that is robust against texture variations. Later, methods, such as [60], were based on classical machine learning techniques, which use engineered features, such as texture elements, and aim to match regions in different lighting conditions [22, 23]. In [12] clustering-based shadow detection relied on color and texture features, whereas in [41] a shadow removal method was proposed for images with uniform background. Shadow and lit regions were separated by ignoring the low-frequency image details. Also, an unsupervised shadow removal method using differential operations for a recent osmosis model was proposed in [7]. However, most of the afore-mentioned methods have been tested only in subsets of benchmark datasets. This is because the implementation of these studies is usually limited to images with specific types of textures and features.

Recently, the focus of research on shadow detection and removal turned to supervised deep learning-based architectures, such as Convolutional Neural Networks (CNNs), and Generative Adversarial Networks (GANs). In [33], a shadow-free image is generated using two deep networks, SP-Net and M-Net. In [34], a GAN-based framework was trained with patches extracted from shadow images, following the physics-inspired model of [33]. In [1], Channel Attention GAN detects and removes shadows by using two networks, which consider physical shadow properties and equipment parameters. Another network, called G2R-ShadowNet consists of three subnetworks, and it requires a small number of images for training [38]. Stacked Conditional Generative Adversarial Network (ST-CGAN) [53] combines two stacked conditional GANs, which provide generators for the shadow detection mask and for the shadow-free image. In [20], shadow removal was treated as an image fusion problem via FusionNet, a network that generates weight maps facilitating the fusion process. Feature fusion was also employed in [9], integrated with multiple dictionary learning. During the last years, several studies have also been proposed to increase the effectiveness of shadow removal in the benchmark shadow datasets. In [26] shadow removal architecture was proposed by Hu et al., aiming to learn direction-aware and spatial characteristics of the images at various levels, using a CNN. Additionally, Hu et al. proposed a weighted cross entropy loss, to train a neural network for shadow detection. That method addressed color and luminosity inconsistencies in the training pairs for shadow removal, by applying a color transfer function. In [62] in order to investigate residual images and illumination estimation with GANs for shadow removal discrepancies in the training pairs, a framework named RIS-GAN was proposed. To refine the coarse shadow-removal result in the shadow-free image of that approach, indirect shadow-removal images were created by estimating negative residual images and inverse illumination maps, in conjunction with the coarse shadow-removal image. In [11], the shadow removal problem was approached in two ways. Firstly, a dual hierarchically aggregation network was proposed to carefully learn the border artifacts in a shadowed image. Without any down-sampling, a foundation of dilated convolutions was considered for attention estimation, using multi-context information. Secondly, taking into account that training on a small dataset limits the network ability to recognize textural differences, resulting in color inconsistencies in the shadowed region, the authors developed a dataset synthesis method based on shadow matting. In [10] a two-stage context-aware network, called Context-Aware Network (CANet) was proposed for shadow removal. In CANet the shadow regions receive contextual information from the corresponding non-shadowed regions. As a next step, encoder-decoder was used to enhance the results. Mask-ShadowNet was proposed in [24], where a masked adaptive instance normalization method along with embedded aligners were applied to remove shadows, considering illumination uniformity and the different feature statistics in the shadow and non-shadow areas. In [65] a Bidirectional Mapping Network was presented, combining the learning process of shadow removal and shadow generation into a unified parameter sharing framework. In [51], a style-guided shadow removal network was proposed to address the issue of visually disharmonic images after shadow removal, and to ensure better image style coherence. The training of all these deep learning-based methods is associated with a) a high computational cost, b) non-trivial hardware specifications, and c) a requirement for a large number of annotated images.

Shadow removal is useful in a variety of computer vision applications, such as the detection of moving objects and pedestrians, either in indoor or in outdoor environments [29, 31, 63], in navigation-aid systems [44], and the recognition of regions of interest in remote sensing images [26, 27]. In [49], an automatic shadow mask estimation approach was introduced, aiming to replace manual labeling in a supervised context, using known solar angles and 3D point clouds. Shadow removal can be an essential component of remote sensing object detection algorithms, aiming to cope with several challenges, such as the complex background, and the variations of scale and density [39, 55]. Wang et al. [54] proposed an automatic cloud shadow screening mechanism, which was utilized for PlanetScope, a constellation of over 130 satellites of European Space Agency (ESA) that can regularly image the entire surface of the Earth. In an unsupervised context, a statistical method [3] was proposed for aerial imaging, based on decision trees.

Unlike current, either unsupervised or supervised, shadow removal methods, this work provides a very simple methodology for automatic shadow removal, based on a novel combination of superpixel segmentation with a strategy for matching shadow and lit regions.

3 Proposed methodology

The proposed methodology is based on a simple, yet very effective strategy. Initially, the shadow mask is extracted using an evolutionary physics-inspired algorithm. Next, both shadowed or non-shadowed image regions, which are coherent in terms of texture and color, are identified and shadow/non-shadow pairs of neighboring superpixels, adjacent to shadow borders, are determined. The shadowed part of each pair is relighted by means of histogram matching.

3.1 Shadow detection

Shadow detection refers to the segmentation of a natural image, either in indoor or outdoor settings, in order to extract the shadowed region. Algorithm 1 summarizes the shadow detection stage with a pseudocode, and Fig. 2 presents a visual summary of this algorithm. The Electromagnetism-like Optimization (EMO) [32, 45, 48] algorithm is employed for multilevel segmentation, aiming to cope with the issue of computational complexity. Initially, the color space used to represent the input image is converted from RGB to HSV (line 2). The Hue (H) component of the input image is segmented using the EMO-based method described in [8], considering that the component Η is invariant to changes in lighting. A set of k images hi, i = 1, 2, …, k, is the output of this operation, representing roughly hue-homogeneous regions (line 3). As a next step, the Value component (V) of the HSV image is multiplied by each hi, i = 1, 2, …, k, image, resulting in a series of new images vi = hi · V, i = 1, 2, …, k, which represent regions with weighted intensities (line 5). This weighting is performed because the generated image regions have lower intensities in shadowed areas. Thus, a subsequent bilevel thresholding step is facilitated. Bi-level thresholding is performed by using EMO, on each of vi, i = 1, 2, …, k, with only one threshold to be optimized. The result of this operation is a set of k binary images, bi, i = 1,2, …, k (line 6). In these images, the pixels corresponding to lower intensities (potential shadowed regions) are set to white, and the remaining pixels are set to black. As a final step, the binary masks bi, i = 1,2, …, k, obtained for each input image, are aggregated to create a mask B representing the shadowed regions of the input image (line 7).

Fig. 2
figure 2

Outline of the shadow detection algorithm

Algorithm 1
figure a

Shadow Detection Pseudocode

3.2 Superpixel matching strategy

Following the application of the shadow detection algorithm, SUShe performs unsupervised shadow removal on image regions with approximately uniform color features. These regions, which are characterized as superpixels, are obtained using the SLIC Superpixel segmentation algorithm. Α superpixel matching strategy is then applied to identify superpixels in the shadow areas that are similar to superpixels in the non-shadow areas. Relighting is performed by transforming the histogram of the shadow superpixels, so that it matches the histogram of the respective non-shadow superpixels.

SLIC Superpixel segmentation

The Simple Linear Iterative Clustering (SLIC) superpixel segmentation algorithm performs local clustering of pixels, considering both color and spatial information, by means of a metric proposed in [2]. The algorithm takes as input an image and the number of superpixels K, in which the input image should be divided. The initial RGB image is separated into K smaller grid intervals S defined in the xy plane. Each pixel inside S has spatial coordinates (xi, yi) and color coordinates (Li, ai, bi) in CIE-Lab color space. Every grid interval S would have a superpixel center Ck = [Lk, ak, bk, xk, yk], for superpixels that are similar in size. Thus, the following distances are calculated for each pixel (xi, yi, Li, ai, bi) in S, to the superpixel center Ck using Eqs. (2) and (3) in order to define the metric provided in Eq. (4):

$${d}_{Lab}=\sqrt{{\left({L}_k-{L}_i\right)}^2+{\left({a}_k-{a}_i\right)}^2+{\left({b}_k-{b}_i\right)}^2}$$
(2)
$${d}_{xy}=\sqrt{{\left({x}_k-{x}_i\right)}^2+{\left({y}_k-{y}_i\right)}^2}$$
(3)
$${D}_s=\sqrt{d_{Lab}^2+\frac{m^2}{S}{d}_{xy}^2}$$
(4)

where Ds is the final metric, which combines Euclidean distances of the color (in CIE-Lab) dLab and the Euclidean distances of the spatial coordinates dxy, normalized by the grid interval S, and m is a variable of the SLIC algorithm, which controls the compactness of the superpixel. The default value for m was set equal to 10 according to [2]. Each cluster center Ck is assigned to the best matching pixels from the 2S × 2S area, according to the distance metric Ds. This process is iterative, until convergence.

SUShe: Simple unsupervised shadow removal

The proposed methodology combines two very simple techniques for region segmentation and relighting of the shadowed areas. The SLIC Superpixel algorithm is used to segment the input image into many small, approximately uniform (with respect to color), regions. Algorithm 2 summarizes the shadow removal stage with pseudocode, and Fig. 3 presents a visual summary.

  • Initially, the binary shadow mask B (obtained in Subsection 3.1) is used to split the input image I (line 1) into a shadowed regions IS and lit regions IL (line 2). These regions are obtained by multiplying IS = B · I and IL = (1 − B) · I.

  • Next, SLIC superpixel segmentation is applied to IS and IL separately (line 3): IS and IL are broken down into the shadowed (ISLIC(S)) and the lit image regions ISLIC(L)),with respect to the color and spatial features of each region.

  • The spatial gravity centers of each superpixel \({I_{SLIC}}_{{\left(\textrm{S}\right)}_i}\) inside the shadow region \(G{C}_{shado{w}_i},i=1,2\dots K\) and the corresponding ones of the lit region \(G{C}_{li{t}_j},j=1,2,\dots K\) are calculated (line 4), where K is the number of superpixels (line 1).

  • In each channel of the RGB color space (line 6), for each gravity center \(G{C}_{shado{w}_i}\) of \({I_{SLIC}}_{{\left(\textrm{S}\right)}_i}\), its Euclidean spatial distance \(d\equiv d\left({GC}_{shado{w}_i\kern0.5em },\kern0.5em {GC}_{li{t}_j\kern0.5em }\right)\) from each gravity center of \(G{C}_{li{t}_j}\)of superpixel ISLIC(L)j is calculated (line 8) in order to find the minimum one (line 9). The minimum spatial distance corresponds to the distance of the superpixel that will illuminate the shadowed superpixel located in the area \({GC}_{shado{w}_i\kern0.5em }\).

  • The lit superpixel pairi, that has the minimum distance d from the shadowed one, \({I_{SLIC}}_{{\left(\textrm{S}\right)}_i}\), is considered as the optimal counterpart for the respective shadow superpixel.

  • Next, the histograms, and the corresponding cumulative distribution functions (cdfs) of the shadow superpixel \({I_{SLIC}}_{{\left(\textrm{S}\right)}_i}\)and of its pairi, are calculated (line 10).

  • Histogram matching [40] is performed on \({I_{SLIC}}_{{\left(\textrm{S}\right)}_i}\)to transform the shadow histogram, so that it matches the corresponding lit cdf of pairi. The shadow superpixels are relighted, using the color values of the lit counterpart pairi (line 11).

  • These steps are iteratively performed for all shadow superpixels \({I_{SLIC}}_{{\left(\textrm{S}\right)}_{i=1,..K}}\).

  • Finally, all the relighted shadowed superpixels are merged to extract the relighted region IR (line 12). After completing these iterations, the relighted shadowed region IR is merged with the initial lit region IL (line 13).

Fig. 3
figure 3

Illustration of the proposed SUShe shadow removal framework

The entire process is repeated three times, for R, G, and B channels, and the results are concatenated (line 14) to produce the final shadow-free image Inon shadow (line 15).

Algorithm 2
figure b

Simple Unsupervised Shadow Removal (SUShe) Pseudocode

4 Evaluation

4.1 Experimental setup and datasets

Τhe proposed methodology and all the experiments have been implemented in MATLAB R2019a, on an AMD Ryzen Core-75800H 3.2 GHz, with 16 GB RAM. The experimental evaluation has been based on three benchmark datasets, namely the Image Shadow Triplets Dataset (ISTD), the Adjusted Image Shadow Triplets Dataset (AISTD), and the Shadow Removal Dataset (SRD). ISTD is the most challenging shadow dataset employed in state-of-the-art works. It consists of 2410 triplets, each comprising the initial RGB images with shadows, the shadow mask, and the ground truth RGB shadow-free image. ISTD is divided in two subsets; the first subset is composed of 1870 images for training and the second one contains 540 images for testing. Each image has a size of 480 × 640 pixels. Another well-recognized dataset is AISTD, which is an improved version of the ISTD, as described in [33], with 1870 training and 540 testing images. Experiments were also performed using the SRD dataset. This dataset has been proposed in [46] and consists of 3088 images in total, from which 2680 are used for training and 408 are used for testing. This dataset includes images of various scenes, illumination conditions and object types, in order to enable the investigation of various shadow and reflectance phenomena.

4.2 Evaluation metrics

The results of the proposed method have been evaluated both quantitatively and qualitatively. The quantitative evaluation was based on the Root Mean-Squared Error (RMSE) and the Peak Signal to Noise Ratio (PSNR). The RMSE between two given images has been calculated for the shadowed area, the non-shadow area, and for all areas, using the evaluation code proposed in [21], which has also been used in major state-of-the-art works, such as [33, 34, 53]. In that code, the RMSE was implemented as follows:

$$RMSE=\frac{1}{n}\sqrt{\sum_{i=1}^n{\left(G{T}_i- Outpu{t}_i\right)}^2}$$
(5)

where GT is the ground truth image, Output is the predicted shadow-free image and i = 1, . . , n represents the index of each pixel in the area of interest (i.e., shadow, non-shadow, all areas) in the image, and n is the total number of pixels in that area.

The Mean-Squared Error (MSE) between the output of the shadow removal and the ground truth without shadows is calculated by:

$$MSE=\frac{1}{n}\sum_{i=1}^n{\left(G{T}_i- Outpu{t}_i\right)}^2$$
(6)

PSNR has been calculated using Eq. (6):

$$PSNR=10\log \left(\frac{M^2}{MSE}\right)$$
(7)

where M is the maximum pixel value in the area of interest. PSNR is measured in decibels (db). A higher PSNR value is linked to higher output image quality (↑). The RMSE decreases as the output image becomes more similar to the ground truth; therefore, the output image quality is improved.

Furthermore, we have also assessed our experiments using the novel evaluation metric Learned Perceptual Image Patch Similarity (LPIPS), which closely matches human reception [61]. For some predefined network, LPIPS merely computes the similarity between the activations of two image patches. An image patch perceived similarity is indicated by a low LPIPS score (↓).

5 Results

In this section, quantitative and qualitative results of the proposed methodology are presented. Different values of K with respect to superpixel segmentation were tested to find the most appropriate segmentation level; specifically, K = 70, 80, 90, 100, 400 and 700 superpixels. Tables 1, 2 and 3 summarize the results obtained by SUShe in ISTD, AISTD and SRD datasets, respectively (the best results are indicated in bold). In Table 1 it can be noted that by setting K = 90 (PSNR = 24.82, RMSE = 8.14, LPIPS = 0.079), SUShe achieves the best shadow removal results in ISTD. Second best results were obtained for K = 70 (PSNR = 24.84, RMSE = 8.15, LPIPS = 0.078). For K = 100, K = 400, K = 700 (mean values approximately equal to PSNR = 24.73, RMSE = 8.17 and LPIPS = 0.084) the results are comparable. For K = 80 the lowest performance is obtained. In the case of AISTD (Table 2), the best results of SUShe are also obtained for the lowest values K, in the range tested, i.e. K = 70 (PSNR = 30.09, RMSE = 4.12, LPIPS = 0.076) and K = 90 (PSNR = 30.06, RMSE = 4.11, LPIPS = 0.076). Again, for larger values of K, i.e. K = 100, Κ = 400, Κ = 700, the results are comparable to each other (mean values approximately equal to PSNR = 29.60, RMSE = 4.24, LPIPS = 0.083). The results obtained for K = 70 are the best all over the datasets we have used for evaluation. In the case of SRD, the optimal results obtained for K = 70 lead to the lowest RMSE score (PSNR = 22.05, RMSE = 8.68, LPIPS = 0.167) The values K = 80, 90 lead to slightly inferior accuracy. It can be observed that higher values for K are not linked with higher quality results.

Table 1 Quantitative Results of the proposed methodology for different superpixel values in ISTD
Table 2 Quantitative Results of the proposed methodology for different superpixel values in AISTD
Table 3 Quantitative results of the proposed methodology for different superpixel values in SRD

Figure 4 indicates that the proposed methodology is relatively insensitive to K. Especially in terms of LPIPS, the results are comparable using different values of K. Yet, its performance is slightly better for the lowest values of K, in the range tested.

Fig. 4
figure 4

LPIPS sensitivity for different K values (a) in ISTD, (b) in AISTD and (c) in SRD

Tables 4, 5 and 6 present experimental comparisons between SUShe (indicated in bold) and state-of-the-art shadow removal algorithms. The results presented for the latter are derived from the literature. RMSE is computed in three ways: a) inside the shadowed region (Shadow), outside the shadowed region (Non-shadow) and in the entire image (All regions). Figures 5, 6 and 7 illustrate qualitative results of SUShe and other state-of-the-art algorithms, including all unsupervised methods and some of the supervised ones, which are publicly available by the authors.

Table 4 Quantitative results in comparison with other state-of-the-art methodologies for ISTD
Table 5 Quantitative results in comparison with other state-of-the-art methodologies for AISTD
Table 6 Quantitative results in comparison with other state-of-the-art methodologies for SRD
Fig. 5
figure 5

Comparative results on the ISTD (from Table 4), in terms of RMSE, PSNR, and LPIPS metrics. The horizontal lines represent the mean scores obtained from all the previously proposed methods per metric

Fig. 6
figure 6

Comparative results on the AISTD (from Table 5), in terms of RMSE, PSNR, and LPIPS metrics. The horizontal lines represent the mean scores obtained from all the previously proposed methods per metric

Fig. 7
figure 7

Comparative results on the SRD (from Table 6), in terms of RMSE, PSNR, and LPIPS metrics. The horizontal lines represent the mean scores obtained from all the previously proposed methods per metric

Table 4 presents comparisons on ISTD. SUShe outperforms all non-neural network-based methods ([21, 23, 58]), as it achieves the best results in terms of all metrics (PSNR = 24.82, RMSE = 8.14, LPIPS = 0.079). The methods of Guo et al. [23] and Gong et al. [21] have been created using supervised techniques in the context of shadow detection and removal. On one hand, Guo et al. use pairwise classification for shadow removal. On the other hand, the method of Gong et al. requires indication of the shadowed and lit areas using a GUI tool to apply shadow detection. In addition, SUShe outperforms three state-of-the-art neural network-based methods: ARGAN [14], Cycle-GAN [64], the method proposed by Nagae et al. [43], and the well-known SP + M Net, DHAN [33] (Table 4). The rest of the neural network-based methods achieve RMSE values that are lower than the RMSE of SUShe. Still, SUShe obtains LPIPS = 0.079 in the ISTD, which is the lowest value with the exception of ST-CGAN. Approaches such as [26, 27, 33, 37, 46], [20, 30] lead to results comparable to SUShe (with a difference in RMSE that is not exceeding 2.0); however, these approaches require training. Figure 5 illustrates indicative shadow removal results of SUShe and the state-of-the-art methods compared. As can be observed in Fig. 8b or e, some methods completely alter the image, while others fail in completely removing the shadow (Fig. 8c, d, f, j). In addition, in Fig. 8h, the pair of shadowed/non-shadowed regions is erroneous, leading to a relighting from an erroneous non-shadow area and eventually to an erroneous brightness reset. SUShe is the only completely unsupervised methodology with a satisfactory performance along the entire ISTD. The comparative results in the ISTD are graphically represented in Fig. 5.

Fig. 8
figure 8

Indicative results of SUShe and other state-of-the-art methods in ISTD. a Shadow Image, b Yang et al., (2012)[58], c Guo et al., (2012)[23], d Gong et al., (2016)[21], e STCGAN, (2018)[53], f DSC (2020)[26], g SP + M Net (2019)[33], h DC-ShadowNet (2021)[30], i Fu et al., (2021)[20], j LG-ShadowNet, (2021)[37], k DHAN, (2020)[11], l Ground Truth Image, m SUShe

Table 5 presents comparisons between SUShe and state-of-the-art shadow removal methods, for AISTD. In this case, SUShe is ranked second with respect to PSNR and LPIPS, after SP + M Net and SG-ShadowNet, and is ranked third with respect to RMSE. The difference between SUShe results and these methods is notably small, given the higher computational complexity of the latter. Figure 9 illustrates comparative results on images from AISTD. Once again, the methods proposed by Guo et al., Gong et al., and the methods DC-ShadowNet and LG-ShadowNet (Fig. 9c-e, h) fail to remove the shadow in the second and third image (center and right column), whereas the method of Yang et al. (Fig. 9b) alters the image inside and outside the shadow regions. The best results are obtained by SUShe, whereas comparable results are obtained by SG-ShadowNet, DHAN, and by the method of Fu et al. (Fig. 9f-g, i). The comparative results in the AISTD are also graphically represented in Fig. 6.

Fig. 9
figure 9

Indicative results of SUShe and other state-of-the-art methods in AISTD, (a) Shadow Image, (b) Yang et al., (2012)[58], (c) Guo et al., (2012)[23], (d) Gong et al., (2016)[21], (e) DC-ShadowNet, (2021)[30], (f) Fu et al., (2021)[20], (g) SG-ShadowNet, (2022)[51], (h) LG-ShadowNet, (2021)[37], (i) DHAN, (2020)[11], (i) Ground Truth Image, (j) SUShe

Table 6 presents comparisons on SRD. In this case, SUShe outperforms all unsupervised methods, and notably outperforms Cycle-GAN and the method of Nagae et al. as well. Furthermore, the performance of SUShe is comparable to the performances of DHAN (RMSE = 8.68) and DeShadowNet [46], in terms of LPIPS. Figure 10 illustrates indicative qualitative results of SUShe and other state-of-the-art algorithms on images from SRD, including the supervised methods. In the case of the first image of Fig. 10 (left), the output of SUShe is obviously closer to the ground truth than the rest of the methods. In the case of the second image of Fig. 10 (right), the result of SUShe is comparable to that of DeshadowNet and ARGAN, DSC, DC-ShadowNet, Fu et al., and SG-ShadowNet. It is also worth noting that SUShe preserves the details of the original image, unlike ST-CGAN, which introduces blur artifacts on the original image. Overall, SUShe achieves comparable shadow removal results with some supervised methods. The comparative results in the SRD are also graphically represented in Fig. 7.

Fig. 10
figure 10

Indicative results of SUShe and other state-of-the-art methods in SRD. a Shadow Image, b DeshadowNet, (2017)[46], c STCGAN, (2018)[53], d ARGAN, (2019)[14], e DSC, (2020)[26], f DC-ShadowNet, (2021)[30], g Fu et al., (2021)[20], h SG-ShadowNet. (2022)[51], i DHAN, (2020)[11], j Ground Truth Image, k SUSh

6 Computational complexity

The computational complexity of SUShe is estimated in order to quantitatively assess its efficiency. More specifically:

a) SLIC Superpixels are used for segmentation of both lit and shadowed regions by employing hue and spatial coordinates. SLIC bypasses tens of thousands of point-to-point redundant distance calculations by localizing the search in the clustering process. SLIC is O(N), where N is the number of the pixels of an image [2].

b) For each RGB channel (c = 3 for the loops in the following process):

  • Histogram calculation of an image is O(N), since N = width × height.

  • Calculation of the gravity centers of the lit superpixels is O(K), where K is the number of lit superpixels.

  • Scanning the shadow superpixels to calculate the following features, is O(K):

  • Calculation of the Euclidean distance with all lit superpixels and finding the minimum amounts to O(K).

  • The calculation of cdfs is estimated almost equal to O(N).

Histogram matching amounts to O(greylevels2).

A quantitative comparison of the Floating-point Operations per Second (FLOPS) between SUShe and available deep learning-based methods is presented in Table 7 and Fig. 11. It can be noticed that SUShe has a significantly lower FLOPS value (indicated in bold).

Table 7 The number of FLOPS per image for different shadow removal methods
Fig. 11
figure 11

The number of FLOPS per image (from Table 7) for different shadow removal methods

7 Discussion and conclusions

This work investigated a simple, efficient and effective solution for the complex problem of shadow removal, which can affect object detection and recognition algorithms, deteriorating their performance. The experimental results showed that by combining simple segmentation and color enhancement algorithms, the original brightness of shadowed regions can be restored. This was validated by quantitative and qualitative comparisons performed with both unsupervised and supervised state-of-the-art methods. All experiments were performed in three widely adopted publicly available benchmark datasets. From Tables 4, 5 and 6 and Figs. 5, 6 and 7, it is evident that SUShe outperforms all state-of-the-art unsupervised methods compared, as well as some supervised ones (Cycle-GAN, ARGAN, Nagae et al., DHAN, LG-ShadowNet, DC-ShadowNet). As for those that SUShe does not outperform, such as SG-ShadowNet, the results of SUShe are comparable both quantitatively and qualitatively (Figs. 8, 9 and 10). To the best of our knowledge SUShe is the only unsupervised algorithm which provides a similar shadow removal performance with most of the state-of-the-art supervised methods compared, on the full, widely used benchmark datasets considered in this study. Furthermore, the comparisons with state-of-the-art deep-learning-based methods in terms computational complexity showed that SUShe is much more efficient (Table 7 and Fig. 11).

Overall, the following conclusions can be derived:

  • SUShe is very simple to implement and of low computational complexity.

  • Its computational complexity is generally lower than that of the state-of-the-art algorithms for shadow removal.

  • The results obtained indicate that SUShe can remove shadows better than any of the compared state-of-the-art unsupervised shadow removal methods.

  • In comparison with the supervised state-of-the-art shadow removal methods, its performance is comparable or better.

  • Solving the shadow removal problem does not necessarily require complex deep learning-based solutions.

Shadow removal is instrumental for object recognition in various domains such as remote sensing image processing, traffic monitoring and object recognition. Future work will involve evaluation of SUShe in various applications where rapid system response is required, such as assistive navigation systems [13]. Furthermore, SUShe can be also applied in the medical domain to investigate how shadow removal can optimize the results of medical imaging in the shadowed regions of an internal body organ, e.g., the shadowed regions in images obtained from gastrointestinal capsules, diagnostic ultrasounds etc.