A review of adaptable conventional image processing pipelines and deep learning on limited datasets

The objective of this paper is to study the impact of limited datasets on deep learning techniques and conventional methods in semantic image segmentation and to conduct a comparative analysis in order to determine the optimal scenario for utilizing both approaches. We introduce a synthetic data generator, which enables us to evaluate the impact of the number of training samples as well as the difﬁculty and diversity of the dataset. We show that deep learning methods excel when large datasets are available and conventional image processing approaches perform well when the datasets are small and diverse. Since transfer learning is a common approach to work around small datasets, we are speciﬁcally assessing its impact and found only marginal impact. Furthermore, we implement the conventional image processing pipeline to enable fast and easy application to new problems, making it easy to apply and test conventional methods alongside deep learning with minimal overhead.


Introduction
Semantic segmentation is a crucial task in computer vision, widely used in fields like autonomous driving, medical tissue evaluation, and remote sensing image analysis.Deep learning (DL) methods, including convolutional neural networks (CNN) [1][2][3] and visual transformers (ViT) [4], have become the preferred approach to solve this type of problem due to their outstanding performance.
DL approaches are adaptive and easily applicable to a wide range of tasks, with little effort.Consequently, they have become the go-to solution for this type of problem, while conventional image processing techniques, such as Thresholding, Watershed, Active Contour, (Super) Pixel Classification and Handcrafted Features, are often overlooked.Nevertheless, there are still automated and sophisticated conventional image processing pipelines (CIPPs) [5][6][7][8] available.DL methods, however, have their downsides as well.The training process for DL involves representation learning and requires a significant amount of computational resources.Although researchers are currently exploring interpretability and explainability in DL [9,10], the available methods, such as class activation maps and gradient analysis, are only applicable for image classification.
In contrast, CIPP approaches excel in areas such as computational complexity, inference speed, and explainability.The decision process of a CIPP can be easily analyzed by executing and visualizing each step separately, as the CIPP consists of many understood steps.CIPPs can be used especially if the problem at hand is easy to solve or an efficient and simple solution is needed [11].An expert can inject implicit knowledge into a CIPP, reducing the amount of information that needs to be learned.Therefore, CIPPs can be successful when few data points or computational resources are available [12].
These properties of DL and CIPP show the potential of both approaches and their ability to complement each other when applied at the right time and scenario.The general consensus states that DL performs best on large and diverse datasets while CIPPs are applied to small and easy datasets.Studies comparing DL and conventional image processing in the field of image classification [13][14][15][16][17][18] or semantic segmentation [17,[19][20][21][22][23][24][25][26][27] show that DL methods consistently Fig. 1 Concept: A synthetic dataset generator creates a dataset with a given difficulty D. A subset with N images is sampled from this dataset.On this subset, we train a DL model and CIPP and compare the performances of both approaches.The performance is measured using the F1-score.This allows us to estimate the "Break-Even-Point" (BEP), up to which the CIPP is still able to outperform DL.In the end, we aggregate the experimental results of each dataset to define the specifications for the usability of CIPPs over DL exceed or at least match the the performance of conventional techniques.All these comparisons where performed on individual datasets not evaluating the underlying dataset properties.A neutral and systematic evaluation of the applicability of CIPPs and DL in relation to the properties of semantic segmentation datasets and guidelines for application are currently missing.
In this paper, we aim to address this gap by analyzing the strengths and weaknesses of DL models compared to CIPPs in terms of dataset properties.We introduce an automatically optimized conventional image processing pipeline, which is as easy to apply to a problem as a DL method, and provide a novel synthetic dataset generator enabling us to conduct experiments and investigate the behavior of DL and CIPP for various difficulties and different numbers of images.The benchmark dataset supports different tunable noises to increase the difficulty.Additionally, we evaluate different dataset sizes with respect to the influences of stochastic errors and heterogeneous errors in training and testing.Finally, we provide guidelines for choosing the appropriate algorithm (CIPP/DL) based on the characteristics of the dataset and problem.

Concept
In this paper, we conduct a study on the performance of DL and CIPP approaches for semantic segmentation to discover effects that let CIPP perform better than DL.The concept of the study is shown in Fig. 1.We focus specifically on the impact of the amount of training data and the difficulty of the task.
Therefore, we introduce a synthetic dataset generator which enables us to quantify and isolate the properties of a semantic segmentation task.Synthetic data is generated with a clearly defined difficulty D. From each dataset, we randomly draw a number of images N and train with this subset a DL model and a CIPP.For each subset with difficulty D and number of images N , we can compare the performance of the DL model and the CIPP and determine the "Break-Even-Point" (BEP) for each dataset.We expect the CIPP to perform well on easy datasets when there are few training images provided.To confirm this hypothesis, we aggregate the results over all datasets to specify the area of usability where a CIPP outperforms DL in relation to the number of training samples and the difficulty of the dataset.

Synthetic dataset
To generate synthetic datasets for the comparison of semantic segmentation approaches, we model an image generation pipeline as depicted in Fig. 2. Each generated dataset contains N unique images with Ntrain = 512 in the train set and Ntest = 512 in the test set.An image I i with i ∈ [1, N ] is a square with image height and width s img = 400 px and three RGB color channels with a corresponding binary label map L i of the same size.In the images and their respective label maps, we place an elliptical object on top of heterogeneous structures that constitute our background.The object and background are slightly altered, e.g., texture on the object and different background colors, to ensure a baseline difficulty for our segmentation task.Subsequently, different types of noise are added with a defined rate D to increase the difficulty further.
In detail, the images are generated as illustrated in Fig. 2 using the following steps: Create background: The background is drawn first and covers the entire image I i with the purpose of giving the segmentation problem a baseline difficulty.In this study, we used a Fig. 2 The data generation process: we start with an empty frame (top left) and create a background (Gaussian Blobs) on the frame, before the object (ellipse) to be identified is inserted on top.As specified by the user, three noise types (Blurring, Salt-n-Pepper, Color-Shift) are applied background consisting of 50-200 randomly generated Gaussian distributions.The color of all the blobs in a single image I i is randomly chosen from the candidates brown, purple, and teal, which all differ from the color of the object to identify (added in the next step).
Insert object: Then an elliptical object is placed in a random position of the image I i and the respective label map L i .Here, we use a green ellipse that has a salt-n-pepper texture and varies slightly in shape, color, and degree of texture.
Apply noise: Noise is added last to an image I i and applied to both the background and the object.The user defines the noise difficulty D Noise ∈ [0%, 100%] which determines the diversity and maximum strength of the applied noise for the entire data set.The exact degree of noise applied to an individual image I i is defined by the noise parameter g Noise,i that is sampled from an interval G Noise as shown in Fig. 3. To be precise, the noise parameter g Noise,i for an image I i is sampled uniformly from the interval G Noise , which is defined as follows: The lower limit of the interval G Noise is defined by the minimum possible noise parameter g min Noise and the upper limit is defined by the maximum possible noise parameter g max Noise scaled with the defined difficulty D Noise .
This sampling process ensures that the noise difficulty D Noise defines the diversity of noise and the maximum amount of noise applied.The concept of applying a varying degree of noise to every generated image is inspired by real-world applications where some samples are easier to identify, while others are noisier.D Noise = 0% means that no additional noise is added to a dataset, but the properties Fig. 3 The noise difficulty D Noise is set by the user for the whole dataset and defines the upper limit of the interval G Noise .The noise parameter g Noise,i is then uniformly sampled from the interval G Noise and applied to the image I i .This is repeated for all images in the dataset of the object, as well as the background, still differ between images, which constitutes a baseline difficulty for our synthetic dataset.By increasing the difficulty of noise D Noise , a larger interval G Noise of noise parameters is covered, thus raising the overall level and diversity of noise in a dataset.The specific noise options are the following: • Blurring: A normalized box filter is applied to the image, thus blurring the object to identify.The noise parameter corresponds to the size of the kernel g min BL = 0 and g max BL = 400 px as the maximum image side s img .
• Salt-n-pepper: For each pixel, a random value is generated, which is added or subtracted from the original pixel value.The noise parameter limits the maximum pixel value that can be generated with g min SNP = 0 and g max SNP = 255.• Color-shift: For each channel, a random value is generated which is added or subtracted from the original channel.The noise parameter corresponds to the value added with g min CS = 0 and g max CS = 255.
In real-world applications, the three types of noise are influenced by various properties of the recording device, such as the employed optics or the resolution of the detector, and therefore not directly related to each other.Consequently, a general parameter D to describe the degree of noise in a dataset can be calculated as the mean of the individual difficulties: To simplify matters, we generate our synthetic dataset using equal noise levels for all types, e.g., In conclusion, the generation pipeline produces pairs of RGB images and binary label maps with elliptical objects for Fig. 4 Five randomly generated images of the baseline dataset with an overall difficulty D = 0% (baseline).The difficulty D is then increased by applying noise the purpose of semantic segmentation.The elliptical objects exhibit a textured surface and vary slightly, but differ from the blurred background in their sharp edges and color.Figure 4 presents examples of a dataset with D = 0%, the baseline difficulty.By increasing the level of noise, the edges of the objects are blurred, the texture is added across the entire image, and the colors of the total image are shifted, complicating the segmentation task.The code to create synthetic datasets can be found here: https://github.com/FMuenke/synthetic-dummy-dataset.

Semantic segmentation models
Conventional image processing relies on simple operations such as thresholding, edge-detection, or morphological operations, where each operation can be specified with individual parameters.We define a CIPP model as a static sequence of conventional image processing operations.As depicted in Fig. 5, our implementation of a CIPP model provides a framework for an expert to stack these operations without manually setting parameters.Each operation has a pre-defined set of parameters.In this paper we select the best parameters by running all available training images through all possible combinations of parameterized pipelines (grid-search) and selecting the sequence of parameters with the best performance on the training data.Our framework provides besides grid-search other optimization strategies as random search or genetic algorithm.
The CIPP model is specifically designed to use simple techniques to ensure intuitive application to a problem, explainable results, and fast inference even when few data points and computational resources are available.The strengths of CIPP are only useful when they are as easy to apply to a problem as DL.Thus, we have created an easily installable Python package to enable the simple use of CIPPs.
The CIPP is designed to solve the synthetic data set presented in Sect.3. The segmentation target features two distinct attributes: salt-n-pepper texture and bright green color, which are detectable with edge-detection and thresholding.The CIPP used is visualized in Fig. 6.We aim to increase the processing speed by reducing the image size to 200px x 200px and only applying the CIPP to the green channel.Afterward, the CIPP has the option to apply blurring of different scales to the image to remove noise.The following inversion operation enables the CIPP to select whether the image should be inverted from maximum to minimum.Segmentation is performed by applying Thresholding, Otsu-Thresholding [28] or Edge-Detection.The segmentation mask is post-processed by applying Closing and Eroding to the segmentation.Further details on the image processing operations are found in Tab. 1.The implementation of the CIPP can be found here https://github.com/FMuenke/cipp In the domain of image segmentation, the U-Net [1] is a prominently used neural network model [29][30][31][32] that we employ as our representative for DL.We use the implementation from [33].The hyperparameters for training the U-Net were determined through a brief random search1 to fit the synthetic dataset.The final parameters are the following: • Loss: Dice, • Optimizer: Adam, Learning rate: 10 −5 , • Early Stopping after 100 Epochs without improving the validation loss, • Learning Rate Scheduling (factor 0.5 after 50 epochs), • Augmentations: horizontal/vertical flip, rotation, cropping.
During the random search, it became evident that a batch size of 8 significantly (+30% F1-score) improved performance compared to a batch size of 1.When training with a few images, the batch size is set to the maximum number of images until a batch size of 8 is reached.
During training, the only augmentation techniques used are horizontal/vertical flips, rotation, and cropping, since the synthetic dataset already uses salt-n-pepper noise, blurring,  Thus, we are considering the baseline U-Net as described (U-Net-R18) and the same U-Net with an encoder pretrained on Imagenet [35] (U-Net-R18-I) in our experiments.

Overview
We train three types of models in our experiments as introduced in Sect. 4. Each model is trained on a synthetic dataset, which covers all types of noise (blurring, salt-n-pepper, and color-shift) simultaneously.This dataset increases its difficulty D by raising the separate noise difficulties D BL , D CS , and D SNP equally, as shown in Fig. 7.The difficulties 0% to 50% in steps of 5% and additionally 100% are evaluated.
We train with different numbers of training images N = {4, 8, 16, 32, 64, 128} for each difficulty D. N corresponds only to the number of images used to train.Since the U-Net models require validation data to determine the optimal time to stop training, we always supply the U-Nets with an equal number of validation images in parallel to the number of training images N . 2 We test CIPP and DL on each subset and compare their F1-score on the full test set.Since the images are selected randomly, we repeat each training 20 times to reduce the random deviation introduced by the initialization of the U-Net and the choice of training images.During the sampling of images, we ensure that the approaches are both trained on the same images by setting the random seed (e.g., the first iteration of CIPP is trained on identical images as the first iteration of the U-Net models).The difficulty, as described in Sect.3, represents the strength of applied noise, as well as the diversity of the data set.

Baseline U-Net
The average results of the U-Net-R18 on our dataset are displayed in Fig. 8.We can see that the U-Net-R18 performs well on the difficulties D ≤ 5% regardless of the amount of training images with performance over 93% F1-score.As expected the performance starts to decrease with an increase in difficulty and the performance increases with an increase in the amount of training images.The U-Net-R18 is still able to reach a performance of above 73% even for higher difficulties D ≤ 50% provided enough training images.Only for the difficulty D = 100% the U-Net-R18 is not able to learn adequate filters for the segmentation task and cannot exceed 6% F1-score.

Pretrained U-Net
Figure 8 presents the average results of U-Net-R18-I, which was pretrained on ImageNet.Our findings indicate that the performance of U-Net-R18-I is closely aligned with that of U-Net-R18.Specifically, as the level of difficulty increases, the performance of both models decreases, while increasing the number of training images improves their performance.However, we observed that U-Net-R18-I performs better than U-Net-R18 by an average of 0.56% across all combinations of difficulties and training images.Notably, the performance gap between the two models is generally below 9%, and the majority of differences larger than 3% occur when the number of training images is less than 16.Our experiments also demonstrate that the effect of pretrained weights on model performance in this scenario is negligible.We assume that this could be attributed to the fact that the pretrained weights available are not specifically tailored to the domain they are being applied to.

CIPP
We assess the performance of CIPP and present the results in Fig. 8. Unlike the U-Nets, the CIPP is less sensitive to the number of training images.We observed that increasing the number of training images from N = 16 to N = 128 leads to a maximum improvement of 7% for all difficulty levels.Notably, the performance gain is more pronounced when the number of training images is increased from N = 4 to N = 16, with an average improvement of around 11%.Although the CIPP's performance decreases as difficulty increases, it still maintains a relatively high performance level of 26% even at the highest difficulty level of 100%.

Comparison
We conducted a side-by-side comparison of the three models, evaluating their performance at three different difficulty levels, as shown in Fig. 9. Rather than presenting only the average performance, we provide the results of all 20 experimental runs, which enables us to observe the variation in performance for different numbers of training images N .The results indicate that the variation decreases as the number of training images N increases for all models.Moreover, we observed that the deviation between separate runs increases clearly as the difficulty level of the dataset increases for both U-Nets.
In Fig. 10, we compare the average performances of the three models by subtracting the performance matrix of the CIPP from those of the U-Nets.This yields a matrix that highlights the differences between the U-Nets and the CIPP.A positive value indicates superior performance by the U-Nets, while a negative value indicates superior performance by the CIPP.We observed that both matrices are similar, as the performances of the U-Nets are comparable.The CIPP outperforms the U-Nets at N = 4 and D = 25%.With increasing difficulty, all models exhibit a drop in performance, but the CIPP maintains a more stable performance.Further, the CIPP is able to outperform the U-Nets at D = 50% even for N = 32 training images.At the highest difficulty level of 100%, the CIPP performs better across all numbers of training images.
Overall, the CIPP exhibits a more stable and consistent performance than the U-Nets, and is less affected by changes in dataset difficulty and the number of training images.Additionally, the spread of the results from the 20 distinct test runs is more stable for the CIPP than for the U-Nets at higher difficulty levels, as seen in Fig. 9.It is worth noting that the U-Nets exhibit outstanding performance for a small number of training images, particularly for difficulties D ≤ 15%.Our suspicion is that the U-Nets are capable of fitting the provided data due to the limited diversity of the dataset and the fact that the validation images closely resemble the general dataset.
When comparing the inference speed of DL and CIPP on a Mac Book with a 2,3 GHz Quad-Core Intel Core i7 processor, the DL approach is able to process 2.36 images per second compared to 62.1 images per second for the CIPP.This makes CIPP especially relevant for devices with low computational capacities, such as microcontrollers.

Transferability
The results presented in this paper are derived from synthetic data, raising the question of whether these findings can be extrapolated to real-world datasets.In the domain of biomedical image processing, datasets often show high diversity within and between datasets, and are typically limited in size.Our research suggests that datasets sharing similar inherent features yields comparable results to those obtained from our synthetic dataset.We have evaluated the effectiveness of the CIPP on four real-world dataset in Appendix 1.The LIVECell dataset [36] and the DOORS dataset [37] exhibit significant diversity between their training and testing subsets.This diversity leads to the anticipated superiority of the CIPP over the U-Net-R18-I.In the case of the Derma ISIC dataset [38], both models demonstrate comparable performance, owing to the dataset's relatively limited diversity.Conversely, on the CryoNuSeg dataset [39], the CIPP exhibits a comparatively inferior performance due to the limited diversity among segmentation targets.

Conclusions
So far, there is no comprehensive study, comparing conventional image processing to modern deep learning algorithms considering dataset specific properties.Thus, we introduced a synthetic dataset with tunable degrees of difficulty and conducted a exhaustive study on DL approaches and our own easy-to-apply implementation of a CIPP.The dataset serves as a versatile benchmark dataset and will be used for future studies as well.Furthermore, it can be used to educate students and researchers in understanding and comparing the performance of semantic segmentation approaches.
Our findings show that DL performs best on tasks with low difficulty/diversity and large amounts of training data.Deep learning is able to consider context and shapes which makes it effective in recognizing the target even with few training images.However, if only a few training images are provided, the diversity of the dataset is not properly represented, leading to decreased DL performance.In such cases, the CIPP is able to generalize better due to human expert input and limited parameter space to optimize.
Overall, we recommend the use of our implementation of a CIPP in all scenarios due to its ease of application and low resource requirements.Our proposed CIPP implementation can work with the same data format as most DL frameworks, reducing the additional effort required for adoption.Additionally, CIPPs allow for easy understanding and adaptation of the processing pipeline to new data, making them useful in laboratory settings with few experimental modalities that require quick adaptation with minimal computational costs.Finally, the CIPP can also be used to post-process outputs of DL approaches by removing artifacts or supporting the labeling process by providing quickly label-maps, which can be corrected by a human operator.
Our study highlights the importance of understanding the strengths and weaknesses of both deep learning methods and conventional image processing pipelines.Researchers and practitioners can use this knowledge to choose the most appropriate approach for their specific task and dataset, based on the available resources and desired performance metrics.
In our future research, we plan to expand the capabilities of our CIPP implementation and assess its ability to assist human annotators in fast and efficient pre-labeling.Specifically, we aim to enhance our CIPP with additional image processing techniques and optimize its performance on various types of image datasets.Additionally, we will investigate the potential of our CIPP to be used in combination with DL methods to further improve semantic image segmentation accuracy.We will also explore the possibility of integrating our CIPP into existing annotation tools to facilitate the labeling process for human annotators.
Funding Open Access funding enabled and organized by Projekt DEAL.
Data availibility All synthetic data can be generated by executing the provided code.The data including the presented results can as well be acquired by contacting the corresponding author.

Declarations
Conflict of interest The authors declare that they have no financial, non-financial or other competing interests.

Code availability:
The code is made fully available.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copy-

B.2: Benchmark datasets
We have assessed the U-Net-R18-I and the CIPP on a selection of four benchmark datasets.We LIVECell [36] is a dataset of phase-contrast images for cell segmentation.We select a subset of this dataset which has a unique feature where training crops have a resolution of 256px × 256px, and test images have different resolutions of 704px × 520px.The CIPP displays no deviation and can reliably select its optimal parameters, even with just N = 4 training images.Conversely, the U-Net's deviation is generally higher and decreases with an increasing number of training images.Although the dataset lacks diversity in appearance, the resolution shift greatly affects the U-Net's performance, whereas the CIPP, as demonstrated in previous tests, is more robust to data diversity and is not affected.While we expect the U-Net to perform well on images of the same resolution this example underlines the robustness of the CIPP model.DOORS [37] is a synthetic dataset to detect boulders on the surface of small bodies.We train on images showing one boulder and test on images with multiple boulders to evaluate the impact of diversity between train and test set.We observe that the CIPP as expected outperforms the U-Net-R18-I as it is more resilient against diversity in the dataset when few training images are available.
Derma ISIC [38] focuses on skin lesion analysis and melanoma detection.The performance of the U-Net-R18-I and the CIPP are similar and overlap.The median performance of the CIPP is slightly larger.CryoNuSeg [39] is a dataset of Hematoxylin and Eosin (H&E)-stained images for nuclei segmentation from 10 different organs.The performances of both models are similar but in this case the U-Net-R18-I slightly outperforms the CIPP.

B.3: Blurring
In this test series, example images are visualized in Fig. 12, we have applied only blurring to gradually increase the difficulty D BL .The results for the difficulties D BL = 10%, 50% and 90% are shown in Fig. 13.The U-Net used in this synthetic experiment did not utilize any pretraining or specific backbone.As expected, the deviation of independent training runs decreases with the number of training images N used for training.The CIPP performs on average (65.64%)significantly lower than the U-Net (80.20%), which indicates that the U-Net generally is better equipped to deal with blurring noise.With an increase in difficulty D BL the performance of both methods drops significantly.The U-Net is more sensitive to higher difficulties than the CIPP.This allows the CIPP to outperform the U-Net for very few images and high difficulties as visible in Fig. 14.In direct comparison, it is apparent that the U-Net is able to handle blurring noise better compared to the CIPP in nearly every test case.We suppose that the ability of the U-Net to assess the shape and context of the image provides in this specific case a crucial advantage.

B.4: Color-shift
In this test series, we have only applied the noise color-shift to the images, as visualized in Fig. 15.Three exemplary difficulties are visualized in Fig. 16 to showcase the impact of random initialization and image choice on both approaches.The U-Net used in this synthetic experiment did not utilize any pretraining or specific backbone.With increasing difficulty the deviation of performance increases for both approaches.It is visible that the CIPP reaches peak perfor-Fig.11 The performance matrix of the LinkNet, FPN and PSPNet for all difficulties D and the number of training images N .The performance is measured using the average F1-score on the synthetic dataset as in Sect. 5  The performance is measured using the average F1-score.The comparison shows the performance gap between CIPP and U-Net mance until a difficulty D CS ≤ 50% even with N = 4 images besides few outliers, while the U-Net is not producing less stable results.It is as well notable that the peak performance of the CIPP drops for difficulty D CS = 90%.We can assume that at this point the CIPP is lacking the necessary tools to compensate for the applied noise.
The CIPP does not improve from N = 16 to N = 128 and has already reached its full potential at N = 16.In comparison, the U-Net improves more from N = 16 to N = 128 than from N = 4 to N = 16 with an average improvement of 23.91%.The CIPP is able to solve this task nearly perfectly.It detects the sharp texture of the object in the task.This sharp texture is not affected by the applied color-shift.This way the CIPP can solve the task by focusing on the texture while being able to ignore color changes.Only for very high color deviations, the texture can vanish when the color is changing so much that it is limited by the allowed values within an image [0, 255].The U-Net in contrast can be confused by the differences in colors especially when few images are presented and the change in color is substantial.
As visible in Fig. 17 the U-Net is only able to outperform the CIPP at a difficulty of D ≥ 70% and N ≥ 64.

B.5: Salt-N-pepper
This test series focuses on salt-n-pepper noise.We have visualized example image in Fig. 18.The results of three different difficulties D SNP are visualized in Fig. 19.The U-Net used

Fig. 5 Fig. 6
Fig. 5 Optimization process of a CIPP.The order of operations is set by the user and each operation has a predefined set of parameters associated with it.During the optimization process, the optimal parameters are determined based on the provided training data by grid-search.All

Fig.
Fig. The performance matrix of the U-Nets and the CIPP for all difficulties D and the number of training images N .The performance is measured using the average F1-score conducted 10 training runs for N = 4, 8, 16 training images.Example images for each datasets are shown in Fig. and the results are summarized in Fig.

Fig. 12
Fig. 12 Example images from the selected benchmark datasets.The first row shows an example from the training set and the second row shows an example from the test set

Fig. 14 Fig. 15 Fig. 16
Fig. 14 Exemplary images from the test series focusing on blurring

Fig. 17
Fig. 17 Exemplary images from the test series focusing on color-shift Fig. 17 Exemplary images from the test series focusing on color-shift

Fig. 18 Fig. 19
Fig. 18 Experimental results for test series only applying color-shift for different difficulties DCS over the amount of training image N

Fig. 20 Fig. 21 Fig. 22
Fig. 20 Exemplary images from the test series focusing on salt-n-pepper

Table 1
Details for the image processing operations used by the CIPP