1 Introduction

The human visual system (HVS) can separate colours, even under challenging ambient light conditions. A healthy human eye can recognise approximately 100 colour shades in each of its three different types of cone cells. In total HSV can roughly recognise around one million different colours Hurlbert and Ling (2012).

Computer vision (CV) research develops solutions that are as accurate as or more accurate than HSV. Colour imaging research, as part of computer vision research, focuses on colour inspection, sorting, detection, and matching. This research addresses colour inspection/matching, particularly in colour differentiation.

All colours we see are combinations of hue, saturation, and brightness values. In digital imaging, light reflected from an object is captured as a digital representation using a digital camera. The entire process is complex and involves electronics, signal processing, and algorithms. This process is unique to each device and depends on the imaging conditions. Therefore, the resulting digital images always vary slightly, more in non-controlled environments and less in controlled environments. The same colour can appear as many different digital representations, or the same digital presentation can appear as two separate colours.

Computer vision research has identified many solutions for colour differentiation. These approaches are based on mathematical algorithms or artificial intelligence (AI). Mathematical algorithms work well in conditions where colours are clearly different, their location is known, and the data is high quality (Isohanni 2022). Artificial intelligence (supervised and unsupervised) is primarily used to recognise objects of different colours and patterns.

However, the current solutions face challenges when dealing with small colour differences in unknown locations. Previous studies have used unsupervised learning to segment/cluster colours. Although the reported performance of algorithms has improved, past research has found out that unsupervised colour segmentation doesn’t work in all use-cases (Xu et al. 2018). Improving the quality of the clustering process or finding the best clustering method can have an impact for example on healthcare (Vishnuvarthanan et al. 2016), smart city (Mao and Li 2019) and agriculture (Abdalla et al. 2019) applications. Challenges unsupervised methods face are a) some algorithms require the setting of clusters before running the algorithm, and b) some algorithms are sensitive to the initial cluster centre guess and might stick in the local optima during the process (Abdalla et al. 2019).

In this study, the recognition of small colour differences with unsupervised learning was investigated by using printed inks. One direct use-case of this research is the colour recognition of functional inks. The development of novel printing methods and inks has enabled the labelling and packaging industry to create innovative labelling methods. Some of these innovations use functional inks. Functional inks change their colour depending on environmental values and it is important to detect this change reliably (Isohanni 2022). This research focuses on recognising colour differences using unsupervised clustering methods and compares different methods, their accuracy, and running time. Contributions of the research are:

  • Comparison of unsupervised learning methods in colour difference recognition

  • Approach to detect small colour changes in printed colours with unsupervised learning

This research is structured as follows: Sect. 2. contains relevant previous research done in the past. Section 3. defines the methods and materials used in this study. The results of this study are presented in Sect. 4. Finally, Sect. 5. discusses conclusions and future research needs.

2 Related work

Colour recognition has its role in object detection, object recognition, image segmentation and many other applications. Most research done around colour recognition focuses on high-level use cases, for example, colour recognition is used in animal/plant recognition (Koubaroulis et al. 2002; Jhawar 2016), in dental applications (Bretzner et al. 2002; Bar-Haim et al. 2009; Riri et al. 2016; Kang and Ji 2010), in face / skin recognition (Yang et al. 2010), in robotics (Rabie 2017; Bazeille et al. 2012) and in intelligent traffic (Gao et al. 2006; Gong et al. 2010; De la Escalera et al. 2003; Zhu and Liu 2006). However there many other use-cases where colour recognition is useful.

Colour recognition can be done by mathematical algorithms, which is the most dominant approach, but during the last decade, artificial intelligence has been applied successfully in many use-cases. In the artificial intelligence context, both supervised and unsupervised learning have been proved as possible approaches. Unsupervised learning is usually used when the clustering of colours is done, or when dominant colours are looked at from the source image (Du et al. 2004; Kuo et al. 2005; Bo et al. 2013; Basar et al. 2020). Supervised learning has been found more suitable in higher level use-cases like object colour recognition (Zhang et al. 2019; Aarathi and Abraham 2017; Feng et al. 2019).

Table 1 Related past research

As seen from Table 1, most relevant past research around unsupervised learning has focused on agriculture and healthcare use-case.

Banic et al. used unsupervised colour clustering for image colour calibration, they researched a custom clustering approach and finally achieved results where the median angular error was almost always below 2° (Banic and Loncaric 2018).

Gerke and Xiao studied the usage of two classification strategies a supervised method (Random Trees) and unsupervised approach. They also used graph-cuts for energy optimization. Their results achieved 97.74% accuracy in the context of recognition of urban objects. however, methods had challenges with shadows (Gerke and Xiao 2014).

Dresp-Langley and Wandeto used in their research quantization error from Self-Organising Map (SOM). With using of the quantization error their purpose was to recognise increase of amounts of red or green pixels (Dresp and Wandeto 2020). Results of their research were good but only colour amounts where used, not intensity.

Yavuz and Köse achieved good results in colour clustering in the blood vessel extraction use-case. Even with small colour differences. Authors used combination of K-means and Fuzzy C-means. They also used postprocessing to remove falsely segmented isolated regions (Yavuz and Köse 2017).

Abdallaa et al. used subsequent combination of various unsupervised learning methods (GMM, SOM, FCM and K-Means). They proposed methods to overcome illumination and weather condition challenges in segmentation of infield oilseed rape images. They achieved segmentation accuracy of 96% even in challenging conditions (Abdalla et al. 2019).

Basar et al. developed a novel approach to overcome challenges in the initialisation of the clustering algorithm. Their research focuses on the challenge of defining number of clusters and the initial central points of clusters. Their results improved segmentation quality and reduced the classification error (Basar et al. 2020).

Wang et al. achieved good results, even with small colour contrast changes, when they used first-order colour moments, second-order colour moments, and colour histogram peaks. Their objective was to extract feature vectors from the image. And to realise data dimension reduction. Use-case was to classify solid wood panels with K-means Wang et al. (2021).

Related work shows that unsupervised learning can be used to classify colours. However there aren’t many studies which have focused on recognition of small colour differences, especially in the printed colours. Some studies have also clearly pointed out that unsupervised learning approaches have challenges when foreground and background objects have only slight colour difference.

3 Materials and methods

The dataset analysed in this research is available in the Zenodo repository (Isohanni 2023). This study used 25 different modified QR-Codes as the original dataset. QR-Codes had three colour zones in them: black, white, colour. (Fig. 1).

Fig. 1
figure 1

QR-code sample

The black zone (1.) was printed with pure black (100K / CMYK(1.0,1.0,1.0,1.0)), the white zone (2.) had no colour (paper white, 0K / CMYK(0.0,0.0,0.0,0.0)) and the third (3.) zone was printed with some colour. All zones had equal size. In the example Fig. 1. colour area with (20 M / CMYK(0.0,0.2,0.0,0.0)) is presented, this means that colour has 20% saturation in magenta channel. Dataset was created by printing colour areas with different ink saturation 20%, 40%, 60%, 80% and 100% and with different colours. Example of a QR-codes with colour saturation 20–100% in magenta (M) channel are illustrated in the following Fig. 2. Other colours or colour combinations used were, C (cyan), Y (yellow), K (black) and combination CY (green). Further experiments were also made with ink saturation’s 10% and 5% with unsupervised learning methods that performed best in the first experiments. QR-Codes were printed in size 20 mm \(\times \) 20 mm. Printed used in this research was an standard office laser-jet (Canon ImageRunner C5535i). And paper that was used was a standard office A4-paper (Canon Black Label Plus 80 g/m2). Printing was done in 300dpi.

Fig. 2
figure 2

Different intensity samples

All QR-Codes were captured into a image dataset which contained 25–30 images per QR-Code. Different environments were used to capture QR-codes to images, some of the images were captured in normal office ambient light level around 500 lux and colour temperature of around 5000k. This higher ambient light environment results more separable colours, but results to more small details (noise) in images. Some of the images were captured in home environment with around 250 lux and 3000 K colour temperature. In darker environment digital cameras might not be able to capture colour information properly (Zamir et al. 2021). Example of differences in these two environments are shown in the following Fig. 3.

Fig. 3
figure 3

Samples from different environments

Image a) is captured in higher ambient light. This results in more details and clear transforms between colours. Image b) is from lower and warmer ambient light, in these images it can also be seen that camera focus makes image blurry and colours not as clearly separable, however, image has less noise.

Fig. 4
figure 4

Flowchart of the process used

All images were taken with iPhone 11 Pro by using standard camera application from around 30 cm distance and stored as JPG’s in RGB format with 8 bits / channel. Captured images were resized to 1200\(\times \)1600 resolution in Photoshop before processing them, no other processing was done. Eventually after resize QR-code occupied around 400\(\times \)400 px area from the image, and each of the zone was roughly 50\(\times \)50 px.

3.1 Process as whole

The process that was used in this research is shown in the following Fig. 4. Process starts from the RGB JPEG-image, and results into CIELAB values of three (white, black, colour) cluster centres (Fig. 5) and Delta-E between white and colour cluster. Delta-E is a measurement which ranges between 0 and 100, it quantifies the difference between two colours, and can be used to determine if two colours are different (Luo et al. 2001). Formula to calculate Delta-E is

$$\begin{aligned}&\Delta E_{00}\\&\quad = \sqrt{ \left( \frac{\Delta L'}{k_L \cdot S_L} \right) ^2 + \left( \frac{\Delta C'}{k_C \cdot S_C} \right) ^2 + \left( \frac{\Delta H'}{k_H \cdot S_H} \right) ^2 + R \cdot \frac{\Delta C'}{k_C \cdot S_C} \cdot \frac{\Delta H'}{k_H \cdot S_H} }, \end{aligned}$$

where

$$\begin{aligned} \Delta E_{00}&: \text {Delta-E 2000 colour difference} \\ \Delta L'&: \text {Difference in lightness} \\ \Delta C'&: \text {Difference in chroma} \\ \Delta H'&: \text {Difference in hue} \\ k_L, k_C, k_H&: \text {Weighting factors} \\ S_L, S_C, S_H&: \text {Adjustment factors based on standard}\\&\quad \text { deviations} \\ R&: \text {Rotation function} \end{aligned}$$

The CIELAB colour system represents quantitative relationship of colours on three axes, lightness (L), and chromaticity coordinates (a,b).

After reading the input image, auto-levelling of colours was performed. Auto-levelling uses following equation:

$$\begin{aligned} I' = \frac{{I - I_{\text {min}}}}{{I_{\text {max}} - I_{\text {min}}}} \times 255, \end{aligned}$$

where \(I_\text {max}\) is the most bright white value in the image and \(I_\text {min}\) is the darkest black value in the image. Auto-level does a histogram equalisation to achieve more uniform distribution of values in range [0, 255] in all (R,G,B) channels (Kao et al. 2006). For the value 0 the mean value of black area (Fig. 1 area 1.) of the image was used and for the value 255 mean value of the white area (Fig. 1 area 2.) was used. Other values of image are then stretched, and as seen in Fig. 5 this makes difference between colours in the image clearer. Histogram equalisation improves the contrast of an image, but might lead into over enhancement of the image. Some other image enhancement methods could be also used, but this is out of scope of this research and discussed in the conclusions. Result of auto-levelling is shown in Fig. 5. were leftmost image is original image, then one on the right is image after auto-levelling. Three squares that are seen at the right-bottom of the image are extracted colour areas after auto-levelling. Data from these areas is grouped into one dataframe. In the dataframe CIELAB colour format is used, so each row of the frame contained one pixels L, A and B values.

Fig. 5
figure 5

Extraction of analysis area

Dataframe was then processed by the unsupervised learning methods. One visual example of the dataframe is presented in the Fig. 6, dataframe LAB colour values are in this figure plotted as density chart. Density of points is shown in different colours, dark blue being less dense and red having highest dense. Circles in the figure are added for illustrative purposes, they show different colour areas (red = black, green = white, yellow = CMYK(0.0, 0.6, 0.0, 0.0)). This is also the result unsupervised learning is expected to achieve. Clusters red and green should stay in quite same location in different images, but yellow cluster moves to different locations in 3D plane. Also, from this figure it can be seen that there is lot of noise present. Noise comes from the digital image and imagining environment.

Fig. 6
figure 6

Illustration of LAB colour data points intensities

Clustering of colours was done with different approaches:

  • Centroid-based algorithms organise the data in non-hierarchical clusters. These algorithms use distance measuring, like Euclidean, between points to determine if points belong to the same cluster. Usually, centroid-based algorithms run iterations and update cluster centres in each iteration. Centroid-based algorithms are efficient and fast. However, they are sensitive to initial cluster centres and outliers Gonzalez (1985); Hartigan and Wong (1979).

  • Connectivity-driven clustering, often referred to as hierarchical clustering, operates under the assumption that points tend to have stronger connections with nearby points compared to those that are farther apart. Algorithms based on connectivity use the distances between points to create clusters. The goal is to minimise the maximum distance required to link these points together Reddy (2021).

  • Density-based clustering defines clusters as dense regions of space. Between dense regions, there are regions where data density is lower. Low-density region can also be empty. Density-based clustering algorithms are good at finding arbitrarily shaped clusters, but they have difficulty when it comes to varying densities and if data has high dimensions (Kriegel et al. 2011).

  • Distribution-based clustering assumes that the data has a specified number of distributions. Each of these distributions has it’s own mean and variance (or covariance). Distribution can be based on Gaussian distribution for example. Points probability to belong to the distribution decreases when it’s distance from distribution centre increases Xu et al. (1998).

In this research K-Mean, Fuzzy C-Mean, DBSCAN, MeanShift, Hierarchical clustering, Spectral clustering, Gaussian Mixture Model (GMM), BIRCH and OPTICS from scikit-learn (Pedregosa et al. 2011) were used as unsupervised learning methods. All of these methods were used to cluster dataframe’s 3D points (L, A, B) into clusters, if method had option to cluster datapoints into specified amount of clusters it was set to three. Methods used are explained in the following sub-chapters in more detail. The objective of each method was to find the cluster centre or average value of points which had the same cluster.

After clustering clusters were labelled, the cluster closest to CIELAB(1.0, 0.0, 0.0) was labelled as “white”, cluster closest to CIELAB(0.0, 0.0, 0.0) got label “black”. Finally there were left one cluster that was labelled “colour”. If only two cluster or more than three major clusters were found, result of the clustering process was considered as failed.

Finally CIEDE2000 Delta-E between “white” and “colour” was calculated. This value was then compared to ground-truth Delta-E calculated from mean values of white and colour areas. Result of the comparison was stored in.CSV file with other image information for later analysis discussed in results section.

3.2 K-means

The K-Means clustering algorithm belongs to the partition-based clustering algorithms category. K-means uses a iterative process to partition n observations into k clusters. K-means minimises the sum of the squared Euclidean distance of each point to its cluster centroid. The K-Means works by first choosing k points as initial cluster centres, this can be done in multiple ways. Then algorithm calculates the distance between each cluster centre and each point. Individual points are assigned to the closest cluster centre. After this, the mean of all the data points in each cluster is calculated and used as the new centroid for that cluster. Then again assigning all points to the nearest centroid. This process is repeated until the centroids do not change or until a predetermined number of iterations has been reached. K-Means is so-called hard clustering where a point can belong to one cluster only Lloyd (1982); MacQueen et al. (1967).

This research uses a standard implementation of K-means where the algorithm is given a fixed number of clusters before running the clustering algorithm. “lloyd/full” Lloyd (1982) and “elkan” Elkan (2003) algorithms are experimented. The difference between these two algorithms is that “full” is an expectation-maximisation (EM) algorithm and “elkan” uses the triangular inequality.

K-means looks into minimising total intra-cluster variance, for this squared error function is used:

$$\begin{aligned} J = \sum _{i=1}^{n} \sum _{j=1}^{k} w_{ij} \cdot \Vert x_i - c_j \Vert ^2, \end{aligned}$$

where J is the distortion measure. n is the number of data points. k is the number of clusters. \(x_i\) represents a data point. \(c_i\) represents a cluster centroid. \(w_ij\) is a binary indicator variable indicating whether data point \(x_i\) is assigned to cluster j.

3.3 Fuzzy C-mean

The Fuzzy C-Means (FCM) is a so-called soft clustering method. In FCM points can belong to two or more clusters. Each point belongs to every cluster to a certain degree. Points that are located near the centroid of the cluster have a high degree of belonging to this cluster, and point that is located far from the centre has a low degree Bezdek et al. (1984).

Fuzzy C-means starts with an initial guess for the cluster centres for a predefined number of clusters. Then FCM assigns every data point a membership grade for each cluster. In the same way that K-means C-means works iteratively and moves the cluster centres to the right locations. FCM’s iteration is based on minimising an objective function that represents the distance from any given data point to a cluster centre weighted by that data point’s membership grade Bezdek et al. (1984).

With C-mean, different fuzziness parameters are 2.0–5.0 experimented. The FCM algorithms fuzziness parameter is a key parameter. Larger fuzziness parameters blur the clusters, and all points will finally belong to all clusters Zhou et al. (2014).

In C-means following membership degree function is used.

$$\begin{aligned} J = \sum _{i=1}^{n} \sum _{j=1}^{k} u_{ij}^m \cdot \Vert x_i - c_j \Vert ^2, \end{aligned}$$

where J is the objective function. n is the number of data points. k is the number of clusters. \(x_i\) represents a data point. \(c_j\) represents a cluster centroid. \(u_ij\) is the fuzzy membership value of data point \(x_i\) in cluster j. And m is the fuzziness parameter (a positive constant).

3.4 DBSCAN

Density-Based Spatial Clustering of Applications with Noise (DBSCAN), an algorithm designed for clustering data points where noise exists. When provided with a collection of points, DBSCAN organises them into clusters by assessing the density of their arrangements. Essentially, points that are closely packed together form dense regions and are grouped accordingly. However, if a point is situated significantly apart from its neighbouring points, DBSCAN identifies it as noise or an outlier Ester et al. (1996).

DBSCAN works based on two parameters:

  • \(\epsilon (eps) \), two points are considered neighbours if the distance between them is smaller than epsilon.

  • minPts the minimum number of points required to form a dense region Ester et al. (1996).

DBSCAN is very sensitive when it comes to these parameters. minPts is easier to decide, as it can be determined from the total pixel amount and cluster count. However \(\epsilon (eps) \) is more complex, and some past research has looked into ways to determine optimal \(\epsilon \) value (Giri and Biswas 2020). DBSCAN doesn’t have similar single objective function as K-means or C-mean, but can be described with the following algorithm.

  1. 1.

    Identify core points based on the density criterion.

  2. 2.

    Connect core points to form clusters using density-reachability.

  3. 3.

    Assign border points to clusters if they are density-reachable from a core point.

  4. 4.

    Identify noise points that are neither core points nor density-reachable from core points.

In this research \(\epsilon (eps) \) = 3.0, 5.0, 1.0 is used and \(minPts = (T / n) \)* 0.8, where T = total pixels and n is cluster count. As result each cluster must have at least 80% of pixels, if total pixel count is divided into n clusters. After running the DBSCAN algorithm density group’s average colour value is considered to represent the whole density group.

3.5 MeanShift

MeanShift is an unsupervised learning algorithm, MeanShift works in iterations. On each iteration algorithm shifts points to the direction where region have the highest density of data points. On each iteration MeanShift algorithm updates candidates for centroids to be the mean of the points within a given region. This region is also called bandwidth, which is the only parameter given to the MeanShift algorithm. After the update MeanShift filters out near-duplicates so that finally a final set of centroids are left Wu and Yang (2007).

MeanShift’s iterative optimization algorithm, which results into a vector that represents the direction in which the density increases the most at the location of the data point, can be expressed as follows:

$$\begin{aligned} \Delta x = \frac{\sum _{i=1}^{n} K(x - x_i) \cdot x_i}{\sum _{i=1}^{n} K(x - x_i)} - x, \end{aligned}$$

where K(x) is a kernel function, often a Gaussian kernel, and \(\Delta x\) is the MeanShift vector for a data point x. \(x_i\) are the other data points in the dataset. In this researched uses bandwidth which is the median of all pairwise distances. Calculation of bandwidth is slow, as it takes time at least quadratic to point count.

3.6 Hierarchical clustering

Hierarchical clustering algorithms build a nested cluster by merging or splitting them successively. The final hierarchy of a dataset is represented as a tree. The root of the hierarchy tree gathers all the samples together, and finally, leaves are clusters with only one sample. Hierarchical clustering depends on the so-called linkage function which defines the distance between any two subsets. Linkage functions developed in the past are single linkage, average linkage, complete linkage, Ward linkage, etc. Nielsen (2016).

This research uses AgglomerativeClustering, which is a version of hierarchical clustering that uses a bottom-up approach (Zhao and Qi 2010). The clustering algorithm starts from the situation where each point is in its cluster (leaf). The algorithm starts to merge clusters using Ward’s linkage criteria. Ward linkage analyses the variance of clusters and minimises the sum of squared differences within all clusters (Miyamoto et al. 2015). The variance, also known as the sum of squares, is calculated based on the squared Euclidean distance between data points and the centroid of the cluster.

$$\begin{aligned} D(X, Y) = \frac{N_X N_Y}{N_X + N_Y} \cdot \Vert C_X - C_Y \Vert ^2 \end{aligned}$$

where D(XY) is the distance between clusters X and Y. \(N_X\) and \(N_Y\) are the numbers of elements in clusters X and Y respectively. \(\Vert C_X - C_Y \Vert ^2\) is the Euclidean distance between the centroids of clusters X and Y.

3.7 Spectral clustering

Spectral clustering is very useful when the shape of the cluster is non-convex. This is because spectral clustering focuses on connectivity rather than the compactness of the cluster. This can be the case for example when the cluster has a shape of an arch, or if clusters are nested circles. Spectral clustering performs measurements for given data points by calculating their pairwise similarities with chosen similarity function. The similarity function is symmetric and non-negative. This research uses Euclidean distance. This results in a similarity matrix which is used in an unnormalized or a normalised spectral clustering Ng et al. (2001).

Euclidean distance as a similarity function is expressed

$$\begin{aligned} \text {Similarity}\;(x_i, x_j) = \frac{1}{1 + \Vert x_i - x_j \Vert } \end{aligned}$$

where \(x_i\) and \(x_j\) are data points.

3.8 OPTICS

Optics Clustering (Ordering Points To Identify Cluster Structure) was developed to address DBSCAN’s weakness when data has varying density. OPTICS does this by linearly ordering dataset points, and points which are spatially closest become neighbours in a density-based representation called the reachability plot. In this plot, every point has a reachability distance. This reachability distance defines how easily a point can be reached from other points. Clusters are then formed based on reachability distances Ankerst et al. (1999).

Reachability distance is calculated with the following function

$$\begin{aligned} r_{\text {dist}}(p, q) = \max (\text {dist}(p, q), \text {core-distance}(q)) \end{aligned}$$

where p and q are data points. dist(pq) is the Euclidean distance between points p and q. \(\text {core-distance}\) is the radius within which a certain density threshold \(\epsilon \) or MinPts is satisfied. OPTICS in this research uses the same parameters as DBSCAN.

3.9 GMM

The GMM (Gaussian mixture model) is a finite mixture probability distribution model. GMM assumes that all data points are generated from a mixture of a finite number of Gaussian distributions. The parameters of these distributions are however unknown. Each Gaussian distribution has mean and covariance which defines its parameters, the whole GMM is built of mean vectors (\(\mu \)) and covariance matrices (\(\sigma \)). GMM uses an iterative expectation-maximisation method to estimate these parameters for distributions Rasmussen (2000). A Gaussian Mixture Model is represented by the following probability density function:

$$\begin{aligned} p({\textbf{x}}) = \sum _{k=1}^{K} \pi _k \cdot {\mathcal {N}}({\textbf{x}} \, \vert \, \varvec{\mu }_k, \varvec{\Sigma }_k). \end{aligned}$$

3.10 Birch

BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is a hierarchical clustering algorithm. BIRCH incrementally builds a tree-like data structure (Clustering Feature tree). Tree summarises the information about the dataset and is built top-down. Algorithm recursively splits the data into subclusters. Algorithm does not use traditional distance-based split criteria like other clustering algorithms. The split conditions are based on factors like the number of points in a subcluster or the sum of squared feature values. The actual split logic is more intricate due to the CF tree structure and the desire to maintain balance in the tree. BIRCH algorithm has two main parameters: the maximum number of subclusters that can be generated from a single cluster and the threshold distance. These parameters determine the size and depth of the Clustering Feature tree Zhang et al. (1997).

Table 2 Results of K-means algorithm

4 Results

The results of this research were obtained by running the process mentioned in the previous chapter. Clustering was performed with a 2,3 GHz Quad-Core Intel Core i5 processor. Clustering process was considered successful, if it was able to recognise three different clusters, and clusters were formed correctly. Correct formulation of clusters was defined as follows, if Delta-E value between white and colour was equal or smaller than 2.0 when compared to ground-truth white - colour Delta-E (calculated in the process step 4 Fig. 4). Value 2.0 for Delta-E was selected as past research has identified it as the smallest colour difference an inexperienced human observer can notice (Han et al. 2022). This can be expressed as the following equation:

$$\begin{aligned} \Delta E_{00}= {\left\{ \begin{array}{ll} {\text {success}},&{} \text {if } \Delta E_{00} <= 2.0\\ {\text {failed}}, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
Table 3 Results of C-means algorithm

Success rate for each method was calculated using the standard formula:

$$\begin{aligned} {\text {success}}\ {\text {rate }}= \left( \frac{{\text {correctly}}\ {\text {clustered}}\ {\text {images}}}{{\text {total}}\ {\text {images}}} \right) . \end{aligned}$$

Results of running all unsupervised clustering methods for the whole 620 images dataset are presented in the following tables. In these tables success rate describes how large share of the images were successfully clustered into three cluster were white and colour had Delta-E equal or smaller than 2.0, when compared to ground-truth. In the method column, method and its possible parameters are described, some methods were tested with multiple parameters so that best parameters and their combination was found. Runtime is the average runtime per image.

Table 4 Results of DBSCAN algorithm
Table 5 Results of GMM algorithm
Table 6 Results of BIRCH algorithm
Table 7 Results of hierarchical clustering algorithm
Table 8 Results of spectral clustering algorithm

K-Means (Table 2) achieved very high success rate. Challenges it had were when density of ink colour was low (20%), specially in M & C CMYK-components. As being a hard clustering method has it advantages when it comes to recognition of colour cluster, this is because each data point must belong to only one cluster. Problems arise from datapoints which are outliers and pull centres away from their ground-truth. Two different algorithms were experimented, “full” and “elkan”, there seems to be no difference between these two algorithms.

C-Means (Table 3) seems to fail when colour tried to be identified had only K CMYK-component, or if density of ink is low (20%). Otherwise, it works well. Close cluster seem to make it harder for C-mean algorithm to differentiate cluster, this comes from C-mean nature of being soft clustering method. In close cluster datapoints have some grade of belonging to other than its main cluster. Outliers also impact negatively when C-mean is used. For the best results fuzziness parameters must be 4.0 or larger. Execution time although increases if fuzziness parameters is large.

DBSCAN (Table 4) usually recognises three clusters, but sometimes only two, which makes the whole clustering process to fail. In these cases DBSCAN mistakenly identifies large amount of datapoints as noise, even though they belong to a cluster. As seen from the table, DBSCAN was experimented with different esp values. DBSCAN fails throughout the dataset, but especially when intensity of colour is between 40 and 80%.

GMM (Table 5) has very good performance, failures happen in same images as with the K-means, but not in so many test cases. Challenge with GMM is that it does not explicitly model outliers, and noisy data points can influence the cluster parameters. Different covariance matrixes, which define the gaussian distributions, were tested. Best options were diagonal and full. Full covariance gives components a possibility to adopt any position and shape individually. When diagonal is used the In the diagonal covariance, the contour axes align with the coordinate axes. However, for other orientations, the eccentricities of components may differ.

BIRCH (Table 6) algorithm manages to solve clustering better than expect, as dataset is not generally considered hierarchical. As with other clustering algorithms BIRCH is also vulnerable when it comes to outliers. Also BIRCH is depend on data ordering of datapoints, this was not considered in this research. With the BIRCH methods two different parameters and their values were tested. Threshold value defines the radius of the which is used to merge samples to subclusters, and branching factor defines the maximum number of subclusters in each node. Even with the best parameters BIRCH fails throughout the dataset, especially when colour is green, or its intensity is 20%

Hierarchical clustering (Table 7) algorithms are sensitive to noise and outliers, as they can impact the linkage and structure of clusters, still hierarchical clustering works well. Distances between datapoints in LAB colour space can easily be calculated using euclidean distance. Usage of euclidean distance is used to allocate points into clusters, depend on their distance. In this research the hierarchical clustering affinity matrix was constructed using euclidean distance, and four different options for linkage were used. Ward linkage, which minimises the variance of the clusters when merged, performed best. With the hierarchical clustering most problematic cases are 20% M and 20% C images, and it also fail sometimes with the green colour.

Table 9 Results of Meanshift algorithm

Nature of the problem does not especially require spectral cluster (Table 8), sill spectral clustering achieves quite high success rate. Spectral clustering does not explicitly handle noisy data points or outliers. Outliers can affect the affinity matrix and potentially lead to the formation of unwanted clusters. When affinity matrix is computed using nearest neighbours algorithm fails to run. However, when affinity matrix is constructed using a radial basis function (RBF) kernel algorithm.

Table 10 Results of OPTICS algorithm
Fig. 7
figure 7

Failed images in experiment one

Table 11 Results of the second experiment
Fig. 8
figure 8

Failed images in experiment two

Meanshift (Table 9) algorithm is very slow on given problem. Larger problem was that Meanshift was not able to cluster data points correctly. Problems of the Meanshift might come from incorrect bandwidth value or noisy data, but even if Meanshift would work correctly it would be very slow in the problem given. Different quantile sizes were also tested, but Meanshift still was not able to recognise clusters.

As OPTICS (Table 10) is based on same base core idea as DBSCAN, it was expected to work in quite similar way. However, OPTICS was able to achieve only 70% success rate with best parameters, being also slower than DBSCAN. OPTICS hierarchical approach still seems to work better than DBSCAN.

Also, one other algorithm was experimented, Affinity Propagation, but it failed to perform in the given dataset, all runs ended to results that no clusters were detected because affinity propagation did not converge.

The following Fig. 7. shows some of the images where clustering failed in all best algorithms. In the image a) colour intensity is CMYK(0,0,0,0.6) gray, b) & c) (1.0,1.0,0,0) green and d) (0,0,0.2,0) magenta. In the image colour is dark in clustering results two main clusters instead of three. In b) & c) ambient lighting is challenging and its too hard for clustering algorithms to recognise green colour. Finally in the d) image there is lot of noise and print quality is low which results to incorrect results.

Finally second experiment was done with algorithms that had best performance, K-means, C-means (m = 3.0), GMM (covariance = full), hierarchical clustering (affinity=euclidean, linkage=ward), spectral clustering (affinity = rbf) with a smaller dataset. This dataset contained colours with ink densities 0%, 5% and 10%, these images were captured in the same environments as images used in the first experiment. In total this dataset contained 230 images. Results of this experiment are shown in the Table 11.

In the second experiment success rate of all algorithms dropped, but still all of them performed well with over 90% success rate when colour intensity was 10%. But when intensity dropped to 5% only K-means achieved almost 90% success rate. Mostly challenges were with magenta and yellow colours, some of the images that failed are show in Fig. 8. In this figure images a) and b) have intensity of 10%, c) and d) have intensity of 5%

5 Conclusion and discussion

Results show that unsupervised clustering methods, K-means, C-means, GMM, Hierarchical clustering, Spectral clustering can be used to recognise colour differences in printed CMYK colours. Especially when difference in ink density on at least one CMYK channel is 20% or more. In these cases GMM achieves 99,0 % success rate followed by Spectral clustering and K-means. These results show that if high success rate is wanted using ink densities in 20% intervals is good way to go. Best parameter options for methods are K-means with K-means++ initialisation, C-means with fuzziness parameter 3.0, GMM with covariance type full, hierarchical clustering with affinity=euclidean and Ward’s linkage and Spectral clustering with RBF affinity. When ink levels drop to 10% all algorithms still have success rate over 90%. But when ink levels drop down to only 5% none of the algorithms achieves over 90% success rate.

The best algorithm based on the results of the both experiments is K-means which achieves better results than quite similar C-mean, especially in the second low ink density experiment. This is because fuzziness of the C-mean algorithm clusters some datapoints incorrectly when cluster centres are close to each other. Hard clustering which K-means uses seems to work well in lower ink densities. Used K-means algorithm uses K-means++ initialisation method. K-means++ assigns the first centroid randomly, then selects the rest of the centroids based on the maximum squared distance. K-means++ initialisation seems to work well. Incorrect clustering happens also with hierarchical and spectral clustering, some datapoints are linked into wrong cluster as they are close outliers or noise datapoints of other clusters. In this use-case datapoints form dense regions, which have spherical or elliptical shape, this plays K-means and GMM’s advantage, but K-means seems to manage outliers better than GMM. While spectral clustering also works well it has challenges when ink density is low, and tries to form two clusters instead of three.

To get better results from unsupervised clustering some methods like filtering out datapoints which are outliers could be used. This would need to be done in a way which doesn’t destroy datapoints which belong to some clusters, as there is limited amount of datapoints in images. Also using some pre-processing techniques, like noise filtering, colour enchantment and so might improve results. Problem might be that in this use-case there is very limited amount of data available for colour enchantment. In this problem it might be also possible to assign one or two initial locations of centroids (black & white) manually, this might lead into better results but is left for future research.