Every cervical cell contains a nucleus situated centrally within it. In many, if not most methods, segmentation of these nuclei is a prerequisite for cervical cell segmentation [4, 10, 12, 14]. The higher the accuracy of the nucleus segmentation, the better the cell segmentation will be as the presence of a nucleus confirms the existence of a cell around it. We describe our datasets, our nucleus segmentation algorithm, and the various tunable parameters thereof in the following subsections. We start with a brief description of some relevant notions and notations.
Definitions
We define some terms here, most of which are related to the various contour properties we used in our algorithm. These definitions will facilitate a better understanding of our algorithm.
Contour size
Contour size is defined as the total number of pixels the contour is spanned throughout. It can also be defined in terms of the total area covered by the contour, but in our case, we have used the pixel count.
Solidity
Solidity is the ratio of contour area to its convex hull area as defined in the following equation.
$$\begin{aligned} Solidity = \frac{Contour~Area}{Convex~Hull~Area} \end{aligned}$$
(1)
Inertia ratio
Inertia Ratio is defined as the ratio of the length of the minor axis of an elliptical object to the length of the major axis.
$$\begin{aligned} Inertia~Ratio = \frac{Length~of~the~minor~axis}{Length~of~the~major~axis} \end{aligned}$$
(2)
Datasets
We use two different datasets to train and evaluate our nucleus segmentation method respectively. The first dataset we use is publicly available (referred to as ISBI dataset henceforth) which was provided by ISBI in 2014 [5] containing 45 synthetic images for training along with 900 synthetic images and 16 real cervical cytology EDF (Extended Depth of Field) images for testing purposes. The synthetic images in the ISBI dataset were created by mirror transformations of background, and random rigid geometric and random linear brightness transforms of different annotated isolate cells in real EDF images [12]. The synthetic images have size \(512 \times 512\), and each of them contains 2–10 different cells, while the real EDF images have size \(1024 \times 1024\). All of the 45 training images and the 900 testing images are accompanied by their nuclei annotations for quantitative evaluation, whereas the EDF images are to be used for qualitative evaluation.
The second dataset we use is a private dataset (referred to as BSMMU dataset henceforth) collected from the Department of Pathology of Bangabandhu Sheikh Mujib Medical University (BSMMU), Dhaka, Bangladesh [25]. Ten cytology slides of cervical pap smear which did not have any diagnosed abnormality were randomly selected from the archive of the Department of Pathology, BSMMU. The slides were taken anonymously without any identifiable information about the patients, and therefore ethical approval was not required. These slides were prepared using BD SurePath\(^{{\mathrm{TM}}}\) Liquid-Based Pap Test technology according to manufacturer’s instructions [26]. Papanicolaou staining procedure was used [27]. Each of the slides contained about 5000 epithelial cells, yielding approximately 50000 epithelial cells in total from all ten slides. All the slides were scanned using Hamamatsu NanoZoomer-SQ Digital Slide Scanner C13140-01 at the highest resolution (0.23 micrometer/pixel) under manual settings [28]. The images were saved in NDPI format with JPEG compression. We manually annotated 25 \(250 \times 250\) sized real cervical cytology images among which 10 have been randomly selected to train the different tunable parameters of our algorithm while the rest 15 have been used for quantitative and qualitative evaluation. No validation set was used. The images contained 10–23 different cells along with their respective nuclei.
Nucleus segmentation
In a cervical cytology image, nuclei are the most prominently visible regions. Commonly, they are relatively dark, uniformly shaped convex regions. Generally, they are circular or have an elliptical shape [4] except for some rare cases in the real cervical images, where, due to the 2D image being scanned from different depths, they may be somewhat irregularly shaped. We develop our algorithmic approach around four of the most visually distinctive properties of a nucleus: size, solidity, inertia ratio, and average intensity.
Before starting the actual nuclei segmentation procedure, a preprocessing step may be need to convert an RGB image into grayscale. The ISBI dataset is already in grayscale, so that doesn’t need further processing. The BSMMU dataset is in RGB. We have explored various ways to convert the images to grayscale. Firstly, we tried averaging the intensity values of 3 channels. Although this is the most rudimentary method of converting RGB to grayscale, it didn’t work well in keeping enough features for the nucleus segmentation to work properly. Secondly, we tried taking each channel separately. We also tried averaging two channels together, excluding the third one. Through careful observation, it became apparent that for the BSMMU dataset, the green channel of the RGB image contains the most amount of information, and so the best way to convert to grayscale is to take the green channel only. Thus, we converted the RGB images of the BSMMU dataset to grayscale by taking the green channel’s intensity value only. Notably, similar findings about the green channel have been reported in the literature as well (e.g., [29, 30]).
The cervical image is first smoothed using a Gaussian blur filter. Then adaptive thresholding is used since a nucleus is the darkest visible region within its cytoplasm. We use the built-in adaptive threshold function of OpenCV [31]. For this function, two parameters need to be carefully tuned, namely, the window size and the constant “C” (more details on these and other tunable parameters are presented in section "Tunable parameters"). The window size should be larger (smaller) for a dataset like the ISBI (BSMMU) dataset where nuclei are more zoomed in (out). The second parameter, i.e., the constant “C” gets subtracted from the mean or weighted mean calculated within the window. This constant needs to be smaller (larger) for an image where the contrast between the nucleus and the rest of the image is higher (lower). Due to overlapping cells, superficial noises, and artifacts, the thresholded image still contains various degrees of unwanted regions, more so in the real cervical images (i.e., BSMMU dataset). In the second stage, to reduce the number of unwanted regions, a convolution filter, which was implemented by Li and Chutatape [32] using Kirsch’s Method [33], is used. This filter computes the gradient of eight different directions by convolving the image with eight different template response arrays as shown in Fig. 1. The final gradient is set to the largest gradient. After that, a threshold is set to determine whether a pixel belongs to an edge or not. The final response contains various edges detected in the image [32]. Now, we do not actually need the edges detected here. But by subtracting this final response image from the previously global thresholded image, we can eliminate a large number of noises as follows. This filter’s response on the uniform, dense, and convex region is weaker than on the irregularly shaped non-convex regions. Thus, by subtracting this filter’s response from our thresholded image, we can remove many unwanted noises due to irregular shapes from the image. But this step has the undesirable side effect of reducing the size of the regions containing the actual nucleus, which we address in the later part of our algorithm.
In the next stage, we get all the contours detected from the thresholded image using the built-in contour detection function of OpenCV [31] and examine them one by one for contour properties. We calculate their size, solidity, and inertia ratio. Since the nuclei are uniformly shaped solid convex regions [4], they have pretty high solidity, always above 8.0 and most of the time above 9.0. So any contour with solidity lower than a preset minimum solidity value is rejected and removed from the image during this step. This minimum solidity is a tunable parameter as described in section "Tunable parameters". We also remove contours that are too small or too big in this step. The acceptable size can differ from dataset to dataset depending on the image’s zoom level and thus kept as a tunable parameter. We also reject regions with a low inertia ratio. Any region with a low inertia ratio is too elongated to be a proper nucleus; hence they are rejected. This minimum size, maximum size, and inertia ratio are also tunable parameters of the algorithm, which are described in section "Tunable parameters".
In the fourth and final stage, we recover the size of the nucleus regions, which were reduced during the second stage. This is an iterative procedure where the immediate neighborhood pixels of the nucleus region are checked one by one to see if they also belong in that region. The measure that determines the validity of the points is the intensity level. For a certain nucleus region, first, we compute the average intensity of all the pixels that already belong to that region. If a neighboring pixel of that nucleus region has an intensity value within a certain range of the average intensity of the nucleus region, then that pixel is deemed as a valid pixel, and subsequently, it is allowed to be part of that region thereby extending the nucleus region. This allowable average intensity range is also a tunable parameter of this algorithm (section "Tunable parameters"). During the parameter tuning stage (section "Tunable parameters") it was revealed that this range should be smaller (higher) for a dataset with low (high) contrast between a nucleus and outer cytoplasm. This iterative procedure continues until one of the three conditions is met:
-
1
No more valid pixel can be found from the immediate neighborhood of the contour boundaries.
-
2
The overall size of the contour (nucleus region) has become larger than the predetermined maximum size of a nucleus.
-
3
The solidity of the overall contour (nucleus region) has become smaller than a preset solidity value. This solidity value is set a bit lower than the usual solidity of a valid nucleus, which, from observation, is 0.8. A value of 0.75 works well here.
Conditions 2 and 3 above act as checks against the uncontrollable growth of the regions in low contrast cervical cell images. Most cervical cell images have a high contrast between the nucleus and the cytoplasm and thus a carefully set acceptable average intensity range acts as the criteria to end this nucleus recovery procedure. But some cells have very low contrast. This can either be the trait of these cells due to high overlapping area, or it can be due to the image scanner focusing on the wrong depth when taking the cervical cytology image. The high or low contrast mentioned here doesn’t refer to any objective measure of contrast; rather it refers to subjective human observation. The parameter tuning stage of our approach, which is described in section "Tunable parameters", doesn’t need the objective measure of contrast. In any case, Conditions 2 and 3 essentially stop the overzealous growth of the regions. The steps of the algorithm are formally presented in Algorithm 1. Also, Fig. 2 shows our algorithm in action on some synthetic and real cervical cytology images.
Tunable Parameters
We have already briefly mentioned our tunable parameters while describing the algorithm in the previous section. In this section, we elaborate on those.
-
Adaptive threshold window size (wsize): The window size is a tunable parameter because this needs to be larger (smaller) for the dataset where the nuclei are more zoomed in (out). Thus for the ISBI (BSMMU) dataset, wsize should be larger (smaller).
-
Adaptive threshold offset (C): This offset value gets subtracted from the mean or the weighted mean calculated within the window. The value of C needs to be smaller for a dataset where the contrast between the nucleus and the rest of the image is higher and larger otherwise.
-
Range of intensity (Irange): This value is defined as the intensity difference between the average intensity of the contour and the intensity of the neighborhood pixel being considered. The value of Irange should be smaller for a dataset where the contrast between the nucleus and outer cytoplasm is low and larger otherwise.
-
Minimum solidity (MinSolid): This is the minimum solidity value which the contours in Stage 3 must conform with to be considered a valid nucleus. Usually, a value around 0.8 guarantees a very low amount of false positives while lower values can be used to allow more contours with the risk of higher false positives.
-
Minimum and maximum size (MinSize and MaxSize): These values are used in Stage 3 to filter out too small or too big contours, which are considered noises. Their values depend on the zoom level of the nuclei in the dataset.
-
Minimum inertia ratio (MinInertia): This is used in Stage 3 to filter out too elongated contours which are usually noises.
Our approach has seven tunable parameters. Manually tuning these parameters is undesirable and inefficient. This also makes the whole approach subjective to the dataset and hampers the generalizability. In order to circumvent this issue, we use a parameter tuning script on a small set (10 images are enough) of labeled training images to tune all of the parameters of our algorithm. This script runs a grid search on various combinations of all the parameters’ values and selects the combination that results in the highest value of the chosen performance measure. The script can be used to select precision, recall, F1-score, or Aggregate Jaccard Index (AJI) [22] as the performance measure to tune the values for the parameters of our algorithm. The procedure of calculating the different performance measures are described in section "Evaluation metrics". This parameter tuning script can be found in our online repository [34].