1 Introduction

Human nails, mainly made of keratin, have a color similar to that of the skin. In general, both regions have a fairly flat texture. Although their shape tends to be circular, there is a wide diversity regarding their eccentricity and shape of contours. Thus, limited assumptions can be made to reliably characterize nails in terms of their visual appearance.

The problem of toenail segmentation has not been widely studied. To the best of our knowledge, no studies have been exclusively conducted to segment toenails. Moreover, toenail segmentation is directly associated with the segmentation of fingernails in images. Most existing studies have investigated biometric systems and disease detection. Mente and Marulkar [19] reviewed various studies based on fingernail disease detection using image analysis. In addition, nail image processing for disease detection was reviewed by Juna V. V. and Dinil Dhananjayan [11].

Moreover, many studies explored segmenting nails from images captured in environments with a controlled perspective and constant lighting. Under these conditions, simple methods, such as considering the color difference between the nail and hand, can be employed.

Some approaches are based on the color difference between the fingers and nails, implicitly assuming a certain illumination and skin tone. Fukunaka et al. segmented fingernails in the HSV color-space by thresholding performed using experimental values [7]. Moreover, Tolentino et al. [21] used the same approach. Fujishima and Hoshino proposed a fingernail detection method using the distribution density of the nail and finger regions [5], and further improved the accuracy of their method by considering the color continuity [6]. Dessai and Borkar [8] segmented nail colors using L*a*b* color-space and k-means clustering. A similar approach was used by Marulkar and Mente in [18], where they applied k-means clustering, L*a*b* color-space, and marker-controlled watershed segmentation to achieve a higher accuracy and precision, although it was only applied to a single image. Additionally, Wang et al. performed hand segmentation and fingertip detection using color information [23]. They also designed a color space for this specific task. To segment the fingernail, they used a pixel-wise classifier based on different distributions of the channel Uskin in the skin and fingernail regions. The method proposed by Kumar et al. [12] first finds the nail region of interest (ROI) of the index, middle, and ring fingers using the hand image. Subsequently, it segments the ROI by applying a fixed threshold in a grayscale image, and finally refines the grown nail plate with a Gabor filter. Lee et al. introduced [17] an image preprocessing method for segmenting the lunula and nail plate. To maintain the nail image quality, we used microscopic imagery. In addition, working in a controlled scenario, Easwaramoorthy et al. [4] identified the difficulty in segmenting the nail bed because the edges in the nail images are not continuous. They proposed an algorithm to extract a set of nail semantic points instead of segmenting the nail itself.

Other approaches allow more flexibility in the illumination of the input images; however, they still control the image perspective, typically because these approaches are designed to be used in biometrics. To achieve illumination invariance, Barbosa et al. [3] presented a nail segmentation algorithm using an active shape model and employing local binary pattern features. They also introduced a dataset with controlled capture conditions. Kumuda et al. [13] used the watershed algorithm on a contrast-enhanced grayscale image to segment the distal region of the nails, nail plate, and finger. Subsequently, they [14] considered an illumination correction stage and iterative histogram-based thresholding in each component of the RGB color space to binarize the fingernail, whose shape is classified as either oval, round, or rectangular. Kurniastuti presented a method that used an active shape model [15]. It consisted of three steps: gray scaling, contrast stretching, and contrast repairing in the image using an active shape model. The method, which segments the fingernail area, requires 45 min for each sample.

In contrast to the highly controlled environments considered in some recent studies, we present a robust toenail segmentation algorithm. Also we use it for measuring the nail area of the big toe in human patients, to be employed within a clinical trial aimed at objectively quantifying the incidence of a particular pathology. Owing to confidentiality concerns, we cannot provide specific details regarding the clinical trial. To confirm the accuracy of the quantification, it is important to correctly segment the toenail from images captured by different cameras under various angles and lighting conditions. Once the nail is segmented, its area is computed, and the nail region can be further processed.

In this study, 348 samples were collected during the clinical practice for segmenting the toenail. This process is a simple task for humans; however, it is tedious and repetitive. Thus, its automation relieves medical practitioners from this time-consuming effort. Moreover, automatic detection of nails in images of human toes is challenging, and computer vision techniques can automate and standardize well-defined tasks.

The initial objective for such a method is to design and implement an automatic system to segment the nail region and measure its performance compared to human-provided segmentation considering a photograph of a big toe in a particularly designed template.

The rest of the paper is structured as follows. In Section 2, we introduce our automatic system for segmenting nails. We cover all steps in detail, including the location of the fingertip and nail using the Hough transform, super-pixel creation using the Quickshift algorithm based on color similarity, subsequent classification with gradient boosting, and watershed-based refinement to provide the final nail mask. Section 3 presents an analysis of the performance of the proposed system measured with quantitative metrics. Section 4 draws the conclusions and presents the strengths and shortcomings of the presented system.

2 Toenail segmentation method

The nail segmentation method consists of several steps as follows: tip of the toe location, nail circle estimation, super-pixel classification, and toenail pixel-wise segmentation. Each step refines the results provided by the previous one. We begin by describing the characteristics of the images we employed, which implicitly define the problem.

2.1 High-level description of toenails

Toenails are part of the outer layer of the skin. They are located at the end of toes and have a slightly different color from the skin owing to their composition: a hardened or horny cutaneous structure formed of keratin. Nails are composed of two parts: the lunula, a lighter region from which the nail grows; the nail plate, which covers the central part of the nail; and, the distal end, in which the nail is no longer attached to the finger or toe surface. Nails are usually circular, although there is a wide diversity in terms of shape, with some of them being more similar to an ellipse or having squared corners.

The photometric properties of nail pixels do not contain sufficient information to segment them from the toes. In Fig. 1, pixels belonging to the nail and toe regions are grouped, respectively. We observe that their distributions across the different channels of the CIELAB color space are similar. Indeed, neither channels nor a combination of them has proved to be sufficient to distinguish these pixels, particularly when considering pictures taken under different light conditions. Thus, although nails and toes can be easily discriminated by a human observer, performing this task based only on pixel-based local information is a difficult challenge.

Fig. 1
figure 1

Distribution of nail and toe pixel values accross the three channels of the CIELAB color space

The dataset used in this study was obtained according to the guidelines of a well-defined clinical trial. In particular, the images were captured in the office of practitioners using the embedded cameras in their smartphones. To control some of the environmental conditions, we designed a template to use as the scene background (see Fig. 2). During the acquisition process, however, we could not control some other environmental conditions, such as the capture viewpoint, camera setup, or illumination.

Fig. 2
figure 2

Designed template for both feet, to be printed in a high-quality matte paper of size A4

Before tackling the image segmentation problem, we performed an image normalization process based on the known measures of the template. It consists of transforming the input image, as shown in Fig. 3 (left), to an image with standard dimensions and orientation, Fig. 3 (right). We detected the position of the template corners and geometrically transformed the image using the affine mapping. Consequently, all normalized images appeared to have been taken from the same angle. The three template-colored squares were mapped to the top-right, bottom-right, and bottom-left image corners. In particular, left foot images are mirrored. Normalized images were always set to measure 1500 × 1500 pixels. Because the real region inside the template measures 5 × 5 cm, 1 cm in the normalized image equals 300 pixels, which can be used to easily measure distances and areas.

Fig. 3
figure 3

Original photograph of a left foot on the template (left) and its normalization (right)

Here, we describe the steps of segmenting the toenail from images, as shown in Fig. 3 (right). In particular, Fig. 4 contains a flow diagram along with the algorithm employed for each of them.

Fig. 4
figure 4

Flow Diagram of the robust toenail segmentation procedure

2.2 Tip of the toe and nail circle estimation

To select the tip of the toe, we segment the foot regions from the background template. Subsequently, the Hough transform is used to detect a circle in the foot region, which corresponds to the tip of the toe. Next, the nail is found within the tip of the toe using a second Hough transform, which depends on the result of the first step. In particular, the second Hough transform is computed by considering only the edges that are in the tip of the toe circular estimation. The transformations were performed as follows.

First, we identified the foot ROI. It is a mask that covers the part of the foot captured in the image, including both the skin and nails. Considering the contrast with the black background template, this can be easily obtained based on the pixel’s photometry, which is discriminating pixels based on a fixed range in the color space. In particular, we used the skin color range provided by Kovac et al. [16] and proceeded to select the largest connected component as the foot ROI.

Next, we computed the edges on the foot ROI by employing several instances of the Canny edge detector [9] using different values of the low threshold, tlow, high threshold thigh, and standard deviation of the Gaussian filter, σ. We considered all possible combinations of the following values:

$$ \left\{ \begin{array}{rl} (t_{\text{low}},\ t_{\text{high}},\ \sigma) \big| & t_{\text{low}} \in \{ 5, 10, 15\},\\ & t_{\text{high}} \in \{ 5, 10, 15, 20, 25\},\\ & \sigma \in \{ 5, 10, 15\} \end{array} \right\} $$

By adding each of the Canny edge instances, we obtained a cumulative contour image, as shown in Fig. 5a.

Fig. 5
figure 5

Edges and figures computed in the tip of the toe detection process

On the edge image, we applied the Hough transform to locate the tip of the toe. Owing to its position and size, the area inside this pattern contained the nail. Therefore, we applied the circular Hough transform on the edge image and selected the best candidate as the nearest circle to the template bottom-right corner that had a predefined percentage of its area within the foot ROI (we chose a threshold of 0.85). In addition, the computed circles were limited to a radius between 0.85 and 1.75 cm. The selected circle is shown in Fig. 5b. The circular pattern captures the tip of the toe; however, it does not locate the nail with an acceptable accuracy.

Finally, we detected the circle that better fitted the nail using the second Hough transform. We discarded edges distant from the tip of the toe circle (see Fig. 5c) so that we mainly kept the nail edges.

In addition, we constrained the radius of this circle according to the size of the toe tip. In particular, we expected the nail radius to be smaller than the radius of the tip of the toe circle (see Fig. 5b) yet bigger than half of its measure. The most prominent circle (Fig. 5d) was the one selected as the nail circle. The experimental results have proven to always find a location on the nail (the circle center) and an acceptable estimation of the nail size (derived from the radius).

2.3 Super-pixel classification

The described process does not provide enough information to accurately segment the nail. A reasonable approach for refining the results might be using machine learning methods.

Therefore, we identified groups of pixels with some common characteristics. By considering these groups as entities for classification, the problem is simplified while maintaining sufficient information to segment the nail.

To group close and similar pixels, we use the Quickshift algorithm [22]. It divides the image into connected and uniform regions, the so-called super-pixels. This method has three main parameters: ratio that balances color-space and image-space proximities, whereas we used a value of 1.0; Kernel size, which is the width of Gaussian kernel used in smoothing the sample, whereas we used a value of 5.0; and finally, maximum distance, which is the threshold value for data distances, whereas we used a value of 10 pixels.

However, a set of connected super-pixels define the nail contour accurately, as shown in Fig. 6.

Fig. 6
figure 6

Super-pixel approach to identify regions as nail or toe

2.3.1 Classifier features

Here, we introduce the features used to determine whether a super-pixel is located in the nail region. Some features originate from the colorimetry of pixels. Other features reflect the geometric attributes of the super-pixels as a region. Finally, some other features are derived from the tip of the toe detection step, leveraging the best-circle nail estimation and foot ROI.

Table 1 lists all the extracted features.

Table 1 Features extracted from each image

Visually, we can distinguish nail from the rest of the skin; therefore, we hypothesize that the colorimetry may provide some information regarding the relevance of a super-pixel. We used channels of different color spaces as features, and the following color spaces: RGB, HSV and CIELAB.

The size of the area occupied by the super-pixel region and its perimeter may also be useful. However, we predicted the existence of a correlation between these characteristics and the rest of them. For instance, their size may provide an indicator of how variable the colorimetry is in such a region.

Additionally, we considered features associated with the position of the centroid of the super-pixel. In particular, we considered the position in the X and Y axes, the distance to the rightest skin pixel in the same row, the distance to the bottom-most skin pixel in the same column, and the distances to the centers of the toe tip and estimated nail circles. The radii of these two circles were also included as features, which were shared by all super-pixels; however, they change from one image to another.

2.3.2 Classification

Different classifiers have been used with the previously mentioned features to classify the super-pixels. We employed SVM, random forest, gradient boosting and multi-layer perceptron. We used the implementation from the Scikit library to reproduce the same experiments. Table 2 summarizes the parameters we used for each classifier.

Table 2 Parameters of each classifier

2.3.3 Comparison of the performance of the classifiers

Table 3 summarizes the results obtained using different classifiers. For each classifier, we used the parameters recommended in the existing studies. The results are the average performances on the test samples. The performance metrics were computed using the test set previously unseen during the training stage.

Table 3 Performance metrics obtained with different classifiers, where the best value is highlighted for each metric

According to the results in Table 3, gradient boosting appears to be the best classifier. Indeed, each performance measure is affected differently by deviations from the ground truth. In particular, sensitivity and precision are only affected by false positives and specificity by false negatives. Moreover, the remaining measures provide a more balanced insight into the overall performance of the method. We particularly rely on F1-measure and Cohen’s κ because of their robustness because they successfully handle both types of errors and class imbalance problems. Thus, gradient boosting was used as the default training model in this study. We observe that the multi-layer perceptron, after exhaustive hyper-parameter fine-tuning, could become a reasonable alternative. However, we select gradient boosting as the default training model because of its ease of deployment.

2.4 Toenail segmentation

The final step is presented here to refine the nail location using the watershed algorithm [20].

We leverage the fact that nails are completely defined by their frontier, significantly better than their colorimetry, size, or shape does.

The watershed segmentation algorithm requires the definition of initial markers that grow until they fill the entire region. To initialize the algorithm, we used the probabilities of each super-pixel to belong to a class, provided by the gradient boosting classifier. Furthermore, we initialized some super-pixels as the initial watershed markers as follows:

  • Marked as background. Super-pixels on the excluded region of the foot ROI mask were slightly eroded with a 5 × 5 kernel.

  • Marked as nail. Super-pixels whose estimated probability of being part of the nail is greater than 99.99%, to guarantee the correctness of the initial marker as much as possible.

  • Marked as skin. Super-pixels with a probability greater than 99.99% were skin.

  • Unmarked. The remaining pixels are tagged by the watershed algorithm based on their closeness and similarity to the already marked pixels.

According to Fig. 7a, the contours of the nail are sharp. Thus, if the initial markers are corrected, the growth of the nail and skin regions would be prone to stop at these edges. This is the case when processing images in practice, as shown in Fig. 7d.

Fig. 7
figure 7

Watershed-based nail detection flow

3 Experimental results

Here, we explore the results provided by the algorithm and discuss the design decisions made during its development. We analyzed the more informative features to distinguish between the nail and toe regions, which proved to be the hardest practical problem. Additionally, we introduce quantitative performance measures to provide objective indicators of how successful the proposed method is.

3.1 Experimental framework

The dataset is composed of 348 images of human big toes acquired using the cameras attached to off-the-shelf smartphones used by doctors. A sample image is shown in Fig. 8. As previously explained, during the image acquisition stage, some parameters such as the illumination, specific capture viewpoint, and camera specifications could not be controlled.

Fig. 8
figure 8

Dataset examples after the normalization process

To accurately evaluate the performance of the proposed method, we divided the images into two sets: 257 images for training and 91 for testing. All of them have a manually segmented ground truth that separates the classes of the toe, nail, and surrounding background (corresponding to the template employed).

Many performance metrics were used to quantify the results.

Because we were dealing with a segmentation task, we employed pixel-wise measures. The performance measures used were sensitivity, specificity, accuracy, precision, F-measure, and Cohen’s kappa. They complement each other, and each of them distinctly captures different deviations from the expected result.

3.2 Method’s performance

We analyzed the importance of each feature used to distinguish the nail and toe regions. In Fig. 9, we can see the importance of the features in the gradient boosting classifier which was selected due to its performance, as can be seen in Table 3. Such feature importance is averaged over the trees that conform the gradient boosting ensemble. In individual trees, the feature importance is a measure to evaluate the variables in splitting the data at a specific node [10].

Fig. 9
figure 9

Relative feature importance according to the gradient boosting classifier

We observe that the most informative feature of the classifier is a positional feature, that is, the distance between a super-pixel centroid and the center of the estimated nail circle. Figure 9 shows that the difference between this feature importance and the others is enormous in relative terms. This is because the regions identified as nails are closer to the circle, and the distant regions can be discarded.

Colorimetry represents a larger group of features used by the training models. The two most informative channels in this group are the mean and standard deviation of the a channel from the CIELAB color space.

Other features that provide important information are the super-pixel area and the distances in the Y axis to the nail circle and the lowest skin pixel.

Figure 10 demonstrates the evolution of the method accuracy when sequentially adding features to the gradient boosting classifier. The features are added in the order of importance, as shown in Fig. 9.

Fig. 10
figure 10

Different performance metrics for the gradient boosting classifier with a restricted number of features, ordered by their relative importance

Table 4 presents the results obtained by evaluating the final mask provided by the watershed algorithm against a manually-built ground truth.

Table 4 Performance metrics using the gradient boosting and refined by using the watershed algorithms

We remark the role of the F1-measure, the harmonic average of precision and recall; and the Cohen’s κ, which indicates the rate of agreement between the two classes. We consider both of them as being high, in particular when compared to other segmentation tasks.

4 Discussion and conclusions

Here, we discuss using various aspects of segmenting nails from the rest of the skin by different techniques. Additionally, we present the limitations and contributions of this study.

Nail and skin cannot be segmented using only color or local information. The pixel-wise colors of human nails are indistinguishable from those of toes (see Section 2). This is particularly important in dealing with different skin shades or illumination conditions. However, we cannot disregard the photometric information (see Fig. 9).

Although nails may have diverse shapes, they tend to fit well in a circle. However, such circles do not necessarily delimit its actual edges.

The Hough transform was successfully used to estimate the nail circle after removing the contours between the toe and background.

Along the same line, the boundary between the nail and toe is sharp yet difficult to discriminate from spurious contours.

Using the watershed algorithm, we leveraged the fact that nails are appropriately defined by their contours, even better than their colorimetry, size, or shape. To use this boundary, the spatial location inside and outside the nail must be known ahead.

The designed method to robustly segment the toenail from images captured using different cameras, angles, and lighting conditions. It is currently used in clinical practices. In particular, it helps measure the nail surface and ease the temporal analysis of patients undergoing a pathology that affects their toenails.

This study mainly carried two limitations. First, we only considered skin shades from Caucasian patients because ofto the locations we conducted the study. Second, a large contribution of errors was located in the lateral area of the nails. Therefore, these darker regions tend to be wrongly estimated as skin. However, this phenomenon slightly affected the measurements of the nail area.

Despite such error-prone regions, we assumed that our algorithm reliably segments nails from human toes, successfully performing the task. This consideration was based on the qualitative examination of several samples, its robustness across the entire set of images, and the quantitative metrics obtained.

A reasonable experimental framework is required to examine the correctness of the results. Performance measures were obtained by considering disjoint training and test sets, although the test set was used to select the best classifier, which could have leaked some bits of information to the unseen test set. In addition, the black background, which is significantly easier to segment, was not considered in computing the metrics. As the data belong to a pharmaceutical business, the database used in this study must remain private.

A trustworthy intelligent system, particulalrly in the field of medicine, must be able to explain their decisions and actions to human users by using techniques that produce more understandable models while maintaining high performance levels [1]. Thus, we analyzed the importance of each feature used to distinguish the nail and toe regions in the trees that conform to the gradient boosting ensemble. Feature relevance techniques seem to be one of the most used schemes as a post-hoc explainability technique in the field of tree ensembles [2]. Feature relevance provides an explicit description of the inner behavior of the model, contributing to the goal of designing an explainable intelligent system.

Although the data acquisition was slightly controlled using a template, the photographs were captured with off-the-shelf smartphones, with uneven illumination and during clinical practices. The robustness of the algorithm is a prerequisite to consider broadening its scope of application. Consequently, its application could also be examined (i) in less controlled environments and (ii) to segment hand nails, as is a similar task to toenail segmentation.