Analytical and Bioanalytical Chemistry

, Volume 398, Issue 5, pp 2191–2201

Robust classification of low-grade cervical cytology following analysis with ATR-FTIR spectroscopy and subsequent application of self-learning classifier eClass

Authors

    • Centre for Biophotonics, Lancaster Environment CentreLancaster University
  • Plamen P. Angelov
    • Centre for Biophotonics, Lancaster Environment CentreLancaster University
    • Intelligent Systems Research Laboratory, School of Computing and CommunicationsLancaster University
  • Júlio Trevisan
    • Centre for Biophotonics, Lancaster Environment CentreLancaster University
    • Intelligent Systems Research Laboratory, School of Computing and CommunicationsLancaster University
  • Anastasia Vlachopoulou
    • Department of Obstetrics and GynaecologyUniversity Hospital of Ioannina
  • Evangelos Paraskevaidis
    • Department of Obstetrics and GynaecologyUniversity Hospital of Ioannina
  • Pierre L. Martin-Hirsch
    • Lancashire Teaching Hospitals NHS Trust
  • Francis L. Martin
    • Centre for Biophotonics, Lancaster Environment CentreLancaster University
Original Paper

DOI: 10.1007/s00216-010-4179-5

Cite this article as:
Kelly, J.G., Angelov, P.P., Trevisan, J. et al. Anal Bioanal Chem (2010) 398: 2191. doi:10.1007/s00216-010-4179-5

Abstract

Although the UK cervical screening programme has reduced mortality associated with invasive disease, advancement from a high-throughput predictive methodology that is cost-effective and robust could greatly support the current system. We combined analysis by attenuated total reflection Fourier-transform infrared spectroscopy of cervical cytology with self-learning classifier eClass. This predictive algorithm can cope with vast amounts of multidimensional data with variable characteristics. Using a characterised dataset [set A: consisting of UK cervical specimens designated as normal (n = 60), low-grade (n = 60) or high-grade (n = 60)] and one further dataset (set B) consisting of n = 30 low-grade samples, we set out to determine whether this approach could be robustly predictive. Variously extending the training set consisting of set A with set B data produced good classification rates with three two-class cascade classifiers. However, a single three-class classifier was equally efficient, producing a user-friendly, applicable methodology with improved interpretability (i.e., better classification with only one set of fuzzy rules). As data from set B were added incrementally to the training set, the model learned and evolved. Additionally, monitoring of results of the set B low-grade specimens (known to be low-grade cervical cytology specimens) provided the opportunity to explore the possibility of distinguishing patients likely to progress towards invasive disease. eClass exhibited a remarkably robust predictive power in a user-friendly fashion (i.e., high throughput, ease of use) compared to other classifiers (k-nearest neighbours, support vector machines, artificial neural networks). Development of eClass to classify such datasets for applications such as screening exhibits robustness in identifying a dichotomous marker of invasive disease progression.

https://static-content.springer.com/image/art%3A10.1007%2Fs00216-010-4179-5/MediaObjects/216_2010_4179_Figa_HTML.gif
Figure

Mid-IR spectral data of exfoliative cervical cytology may be classified by eClass and facilitate prediction in a clinical setting

Keywords

ATR-FTIR spectroscopyCancer screeningeClassExfoliative cervical cytologyFuzzyPrediction

Abbreviations

ANN

Artificial neural networks

ATR-FTIR spectroscopy

Attenuated total reflection Fourier-transform infrared spectroscopy

CIN

Cervical intraepithelial neoplasia

eClass

Evolving fuzzy classifier

HPV

Human papillomavirus

HSIL

High-grade squamous intraepithelial lesions

IR

Infrared

k-NN

k-Nearest neighbours

PCA-LDA

Principal component analysis–linear discriminant analysis

SV

Support vector

SVM

Support vector machine

νasPO2

Antisymmetric phosphate stretching vibrations

νsPO2

Symmetric phosphate stretching vibrations

Introduction

Cervical cancer is the fifth most common cancer in women worldwide and is associated with the human papillomavirus (HPV), a sexually transmitted viral infection [1]. A screening programme implemented in the UK has reduced cervical cancer mortality considerably. However, cases are often misclassified [i.e., result in false positive (poor specificity)] and/or false negatives (poor sensitivity). Currently, the Papanicolaou test has a true positive diagnosis of 59% and a true negative diagnosis of 69% [2, 3]. Whilst false positives result in patients receiving unnecessary treatment, women receiving false negative results will not receive any treatment. The consequence could be the development of invasive carcinoma prior to their 3- to 5-year recall (the UK system has in place a 3- to 5-year recall system from the age of 25). Both scenarios should be eliminated with priority on the latter. In low-grade lesions, the amount of atypical cells accounts for only 5–7% of a smear, generating detection problems leading to misclassification [4].

The UK cervical cancer screening programme requires the sampling of cells from the transformational zone of the cervix; composed of the endocervical zone of columnar mucous-secreting epithelial cells and the ectocervical zone of thick, stratified squamous epithelial cells. Samples are decontaminated using liquid-based cytology [2]. Exfoliative cervical cytology may be characterised as normal (free of atypia), low-grade [cervical intraepithelial neoplasia (CIN)1 or low-grade squamous intraepithelial lesions] or high-grade [CIN2/3 or high-grade squamous intraepithelial lesions (HSIL) or severe dyskaryosis (? carcinoma)]. It is known that around 60–70% of low-grade lesions regress without treatment, with the remaining progressing to develop HSIL [5]. The identification of women with a high likelihood of progression would facilitate earlier treatment through better targeting of individuals. An important aspect of this project is to monitor the data acquired per patient amongst low-grade specimens to determine if a pattern may be derived to identify women likely to progress to develop HSIL.

Risk factors for HPV infection include smoking, oral contraception and number of sexual partners [6]. A current HPV vaccination programme in place in the UK is aimed at protecting prepubescent girls from the most common oncogenic HPV types (HPV16 and HPV18) [7]. Over 100 types of HPV have been characterised, including 13 with high oncogenic potential. Although vaccination might facilitate prevention, it will not eradicate this disease [8]. A number of biophysical techniques have been explored as potential diagnostic tools, including Fourier-transform infrared (FTIR) spectroscopy, Raman spectroscopy and photothermal microspectroscopy [911]. These spectroscopic tools provide chemical information in the form of a spectrum, representative of the molecules present in a given sample. Infrared (IR) spectroscopy has been used to derive spectra from cells and tissues to facilitate the identification of disease [12]. IR and Raman spectroscopy are considered complementary techniques due to the range of chemical bonds they are each capable of detecting. Biological applications of Raman spectroscopy include cervical cancer diagnosis [13] and, the identification and diagnosis of oesophageal adenocarcinoma [14]. The high-throughput and noninvasive nature of these methodologies supports their potential as novel tools for disease characterisation and diagnostics.

The potential for spectroscopic tools to be valuable in cancer diagnostics arises from the fact that they are capable of distinguishing cells that are committed to becoming cancerous prior to detection with conventional screening. This would lead to earlier detection, which is often accompanied with a better prognosis. The progression from research to an automated, sensitive and objective tool for clinical use hinges on coupling an appropriate, robust multivariate analysis to the device. Attenuated total reflection FTIR (ATR-FTIR) spectroscopy has previously been employed for the interrogation of cervical cytology to identify biochemical differences between cytological grades and in the prediction of unknowns [2, 3, 1517].

ATR-FTIR spectroscopy is a robust high-throughput methodology capable of detecting cancerous lesions and identifying cervical atypia [2, 17, 18]. Chemical bonds absorb IR at different wavelengths in the mid-IR region (4,000–400 cm−1), producing a biochemical cell fingerprint (1,800–900 cm−1) representative of the biomolecules present in a given biological sample. IR spectroscopy may therefore be employed across a range of biological settings [9, 1921]. During spectral acquisition with an ATR-FTIR spectrometer, a diamond crystal of size 250 × 250 μm is in contact with a given sample. IR is directed through the crystal and totally internally reflected, causing an evanescent wave to extend a few microns into the sample. The sample in contact with the crystal absorbs this IR and an absorbance spectrum may be produced following Fourier transformation of the detected IR. The large sampling aperture and the acquisition of several spectra per sample allows for maximum detection of atypia. Spectra must be pre-processed prior to analysis to take account of sample thickness and coverage by the sample as well as the pressure of the crystal on the sample during spectral acquisition. Processing may include baseline correction and normalisation. Detectable chemical entities in the biochemical cell fingerprint region include 1,750 cm−1 (lipid), 1,650 cm−1 (amide I), 1,550 cm−1 (amide II), 1,260 cm−1 (amide III), 1,225 cm−1 (antisymmetric phosphate stretching vibrations, νasPO2), 1,080 cm−1 (symmetric phosphate stretching vibrations, νsPO2), 1,030 cm-1 (glycogen) and 970 cm−1 (protein phosphorylation); band shift associated with conformational changes can also be identified [22, 23]. This high-throughput technique has clinical implications for cancer diagnostics if coupled with a suitable computational classification methodology.

The rationale for developing a classification tool is to facilitate the prediction of exfoliative cervical cytology specimens following ATR-FTIR spectroscopy. The application of ATR-FTIR spectroscopy followed by eClass is a novel approach in the characterisation and prediction of IR spectra from biological specimens. An initial investigation using eClass with cervical cytology data involved the division of the dataset into training (eClass learns from IR data labelled by cytology grade) and prediction [eClass classifies IR data without cytological labels (unknown)], producing an average classification rate of 77% [17]. This project extends previous work and is aimed at reducing the misclassification rate, focused on the development of classifier eClass in comparison to alternative approaches, and the simulation of a real-world application with new/unseen data. The major benefit of eClass is its ability to adapt to the addition of new data without human (expert) involvement or retraining [17, 24, 25]. During the training stage, a model is produced to represent the dataset, consisting of a number of fuzzy rules encompassing the relevant information needed for characterising each class. This model may then be applied for the predictive classification of unknown data. Theoretically, feeding the eClass model with new data will produce more accurate and reliable classification rates as the model adapts to the new data and becomes more robust. The linguistic output describes the fuzzy rules and different features responsible for distinguishing each class, providing additional information about the samples. A linguistic result produced by eClass has the shape of a logical “IF-THEN” statement that links a test sample to a given class.

The aim of this project was to further explore the feasibility of our previously reported classification tool in forming part of a high-throughput technology to support the current cervical screening programme [17]. IR spectral data acquired from an additional 30 exfoliative cervical cytology samples were employed to expand, develop and test eClass in a ‘real-world’ scenario. These additional data provide a unique opportunity to optimise eClass and investigate the ability of the classifier to adapt. In particular, considering that the original and new specimens are from different sources, a reasonable predictive power would support the potential of this methodology as part of a cervical screening programme. Alternative computational classifiers k-nearest neighbours (k-NN), artificial neural networks (ANN) and support vector machines (SVM) were compared to eClass.

Materials and methods

Sample preparation

A cohort of low-grade squamous intraepithelial cervical cytology samples were acquired from 30 patients with local Institutional Review Board (University Hospital of Ioannina, Greece) approval. Cytology samples were collected into ThinPrep solution (Preserv CytTM solution, Cytyc Corp., Boxborough, MA, USA). The exfoliative cytology in ThinPrep was centrifuged, the supernatant removed, and the remaining cell pellet was resuspended in autoclaved distilled water; this washing step was repeated three times. The resuspended pellet was pipetted onto low-E reflective glass slides (Kevley Technologies, Chesterland, OH, USA) and stored in a dessicator until analysis.

Data acquisition

An existing dataset (set A) used to develop eClass was derived from spectral analysis of cervical cytology from 180 patients from a UK-based cohort. IR spectra (n = 10) were acquired from independent locations (n = 10) across each patient sample; this gave an initial dataset of 600 normal, 600 low-grade and 600 high-grade IR spectra [11, 17]. A second dataset (set B) consisting of 300 IR spectra from 30 new specimens was obtained in the same fashion as the initial dataset. All IR spectra were acquired using a Bruker Vector 22 FTIR spectrometer with Helios ATR attachment containing a diamond crystal of size 250 × 250 μm. The spectra were acquired at 8 cm−1 resolution, co-added for 32 scans and converted to absorbance by Bruker OPUS software. The crystal was cleaned with distilled water prior to use and in between each sample. Pre-processing was performed over the region 1,800–900 cm−1, including being cut, baseline-corrected and normalised to 1,650 cm−1 (amide I) [22]. Min–max normalisation was used as amide I is a stable band in our samples. All IR spectra, set A and set B, were acquired under the same experimental conditions and pre-processed in the same manner. This resulted in 2,100 spectra, each containing 235 data points (absorbance intensities).

Pattern classification methods

Four classifiers (eClass, k-NN, ANN and SVM) were employed in this study. All classifier models were trained and tested with exactly the same set of {training, test} datasets to provide a “fair” comparison between these models. Training sets consisted of all of set A with varying amounts and combinations of set B, the aim being to quantify classifier performance as a function of the number of set B patients used in the training dataset. From the original 235 data points present in the IR spectra, only the top 25 biologically significant band positions selected by eClass using only set A were used in the classifiers [17] (Fig. 1). These 25 band positions were selected by eClass for their predictive power in the original dataset (set A) and an overall improved classification rate in comparison to using all 235 data points. Furthermore, many correlated with previously identified band positions important for distinguishing grades of cervical cytology using principal component analysis-linear discriminant analysis (PCA-LDA) [11]. In this instance, PCA-LDA was run independently on the same data and loadings were interpreted to identify band positions responsible for inter-category variance. To allow a comparison of computational techniques, the same 25 band positions were used throughout this investigation.
https://static-content.springer.com/image/art%3A10.1007%2Fs00216-010-4179-5/MediaObjects/216_2010_4179_Fig1_HTML.gif
Fig. 1

Average spectrum for each original class [normal (set A), grey line; low-grade (set A), grey dashed line; high-grade (set A), black dotted line] and average set B spectrum [low-grade (set B), black dot-dash line] plotted with the 25 selected features (solid line)

eClass

eClass is an evolving neuro-fuzzy classifier. During training, a set of fuzzy rules is formed describing the entities (absorbance intensities) important for the classification of each class; these constantly adjust to the available training data. One of the advantages of eClass is that it does not require parameter optimisation as its only parameter ‘scale’ can be directly inferred from the training data. The adaptive nature of eClass makes it unique among the classifiers used here as it facilitates the incorporation of new data without retraining. The classification time of eClass is proportional to the number of fuzzy rules in the model, the number of which tends to stabilise after a particular stage of training.

The eClass0 model has been used herein. eClass0 possesses a zero-order Takagi–Sugeno consequent (i.e., the “after-THEN” part), which is easier to read and understand than eClass1 originally used in [14]. A fuzzy rule in the eClass0 model has the following structure:
$$ {\hbox{IF }}\left( {{W_j}{\hbox{is }}{C_{ij}}} \right){\hbox{ and }} \ldots {\hbox{ and }}\left( {{W_n}{\hbox{is }}{C_{in}}} \right){\hbox{ THEN Class}} = {\hbox{Clas}}{{\hbox{s}}_i}. $$
(1)

In the formula above, i = 1, 2…number of rules; j = 1...n; n is the number of input variables (absorbance intensities); W1Wn are the input variables; Cij is the jth cluster centre of the ith fuzzy rule; Classi ∈ {“Normal”, “Low-grade”, “High-grade”} is the class label associated with the ith rule. The eClass0 model is composed of several fuzzy rules of the shape above. More than one rule can exist per class. In the training stage (when spectra with known classes are presented to the algorithm), the rules are formed from scratch using an evolving clustering approach to decide when to create new rules. The cluster centres (Cij) are represented by chosen spectra (prototypes) within the data. In this paper, we used the variant of evolving clustering that uses the “local (per-class) potentials” introduced in [28]. In the test stage (when a spectrum of unknown class is presented to the classifier), the different Classi outputs are combined together (e.g., using a voting criterion) to give the final class. All the details about the eClass model and learning algorithm can be found in [17, 2429].

k-Nearest neighbours

The k-NN classifier has no true training stage and instead works as follows. Within the training set, the algorithm finds, for an unknown spectrum z, the k spectra closest in distance (e.g., Euclidean distance) to z and classifies z according to the majority vote among these k spectra [30]. A lack of structure is the main flaw of k-NN with the connection between input data and predictions poorly understood. Additional drawbacks are the memory requirement for storing training data and the classification time, which increases at a rate of n × log2n as the training dataset size (n) increases [30]. The optimum k value was found to be 1.

Artificial neural networks

Various ANNs have different architectures depending on the application. ANNs relevant for classifying spectroscopy data consist of a series of interconnected units, perceptrons, forming layered structures [30]. The perceptrons are nonlinear models where the inputs are outputs of the previous layer. The performance of an ANN is heavily dependent on its structure. Choosing the number of neurons in each layer and activation functions can be challenging as there must be a compromise between classification rate and overfitting. ANN models are difficult to interpret as their structures and internal parameters disclose little about the relations between input data and class predictions [30]. ANN classification times are fixed and depend only on the structure of the model. The optimum ANN conditions for this application are two hidden layers of 20 and 11 perceptrons each (Levenberg–Marquadt backpropagation algorithm), and the output layer has a linear activation function whereas the hidden layers have sigmoid activation functions [31].

Support vector machines

The SVM classifier is a “machine” of two-class support vector classifiers with the final class decided from the majority. In training mode, the support vector (SV) classifier tries to find two parallel hyperplanes (“walls”) between which no training data point exists and at either side of which lie all the points from each respective class [32, 33]. The distance between these two planes is maximised in training and reduces the misclassification rate. The data points that touch the wall at either side are called SVs. The original SVMs are linear classifiers with more recent nonlinear classifiers formed from the application of the mathematical formulation “the kernel trick” [32, 33]. SVMs can produce good classification rates, although with spectroscopy data, the classifiers contained a similar number of SVs to the number of training data points, an indication of potential overfitting. This is problematic when the volume of data is large (in orders of hundreds, thousands or millions of samples, i.e. in a potential cervical cancer screening programme). The classification time of SVMs increases proportionally to the number of samples in the training set. The SVM classifiers were configured with a Gaussian kernel and parameters C = 4 and γ = 0.078 [32, 33].

Results

The average IR spectrum from each class of set A (i.e., normal vs. low-grade vs. high-grade) was viewed with the average IR spectrum from set B. Although diagnosed as low-grade, there were obvious differences between the set A and set B low-grade IR spectra (Fig. 1). Pattern classification was applied to the dataset and comparisons were made between rates, interpretability and training times. Initially, all of the classifiers were trained on set A only, with set B used as ‘unknown’ data. The results were poor (≈10–20%) across all classifiers, demonstrating that the two datasets were very dissimilar (Table 1). This required the classifiers to be trained also with data from set B as well as all the samples from set A; a schematic for the process is given (Fig. 2).
Table 1

Correct classification rate for each classifier with varying numbers of patients from set B used in training

eClass results

eClass (%)

SVM (%)

ANN (%)

k-NN (%)

Original training set (set A)

21

9

15

16

set A + 3 patients (10%) from set B

37

41

24

49

set A + 6 patients (20%) from set B

57

62

22

61

set A + 9 patients (30%) from set B

66

73

32

67

Development of the training set towards predictive classification. Initially, only set A data were used for training and this was expanded in an incremental fashion by incorporating 10%, 20% or 30% of data from set B. This increased the robustness of the predictive classification when using the remainder of set B data as an ‘unknown’ in each case.

https://static-content.springer.com/image/art%3A10.1007%2Fs00216-010-4179-5/MediaObjects/216_2010_4179_Fig2_HTML.gif
Fig. 2

Overall schematic representation of the approach step-by-step. Spectra shown are low-grade set B, averaged per patient

In order to simulate the performance of the classification algorithms in a real-world application, where new data would become available in small sets, the following evaluation procedure was set up. Let p (=0–29) be the number of patients from set B included in a training set. From a total of 2,100 spectra (set A = 1,800; set B = 300), a number of 1,800 + 10p spectra were used in training with the remainder (10 × (30 − p)) in testing, for p = 0–29. p was made so as to vary in this manner because we wanted to see the improvement in the classification rate as more and more patients from set B are added into the training set (each time p is incremented, it means that one patient from set B was added to the training set). As classifier performance varies according to the combination of patients used in training, the procedure of making p increase from 0 to 29 was repeated 100 times, each time with a different sequence of the patients in set B being added incrementally to the training set. Each time one patient of set B is added to the training set, the eClass model is adapted to these new data, whereas the other classifier models need to be retrained. Averages were taken for the following quantities as functions of p: classification rate, training time (except k-NN), number of rules (eClass only) and number of SVs (SVM only).

eClass

Preliminary classification of set B (without patients from set B featuring in the training sets) was very poor (≈21%). The addition of p (patients from set B) to the training set improved the classification rate significantly (≈67–81%; Tables 1 and 2). Experiments were carried out using three two-class classifiers [CNL (normal and low-grade), CNH (normal and high-grade) and CLH (low-grade and high-grade)] combined in a cascade fashion to produce an overall classification rate [17]. Further tests were carried out comparing the three two-class cascade classifier to one overall classifier [CNLH (normal, low-grade and high-grade)] combining all three classes. Both approaches have comparable classification rates when 100 spectra (ten patients) from set B are used in training (Table 2). In addition, CNLH is user-friendly and easier to interpret than the cascade classifier and was thus adopted hereafter.
Table 2

Correct eClass classification rate using the number of fuzzy rules chosen and band positions (cm−1) selected

Classifier

Rules

Band positions

Classification rate (%)

CNL (+p = 10)

8

25

83

CLH (+p = 10)

8

25

70

CNLH (+p = 10)

34

25

70

CNLH (+p = 15)

34

25

76

CNLH (+p = 20)

34

25

81

The classifiers are as follows: CNL (+p = 10) is normal and low-grade data from set A as a training set combined with 100 spectra (ten patients) from set B; CLH (+p = 10) is low-grade and high-grade data from set A as a training set combined with 100 spectra (ten patients) from set B; CNLH is normal, low-grade and high-grade data from set A as a training set combined with 100 or 150 or 200 spectra (10, 15 or 20 patients, as indicated) from set B. For each classifier, the remaining set B data were used for predictive classification.

p patients

The addition of varying patients from set B to the training set was explored further (Fig. 3a). Prediction of IR spectra was accumulated for each patient to produce a patient classification rate by combining results from p = 29 over 2,000 combinations of different permutations of set B (Table 3). Table 3 is a “per-patient confusion matrix”. eClass has been trained with set A + p = 29 from set B. The remaining patient from set B constituted the test set. This train test setup was repeated 2000 times to obtain Table 3, so each patient was tested 2,000/30 ≅ 67 times on average. Each row of the table shows the % of test times that the corresponding patient was classified as normal, low grade or high grade. The overall classification rates were ≈88% per spectra and 97% per patient. One patient was incorrectly classified as 42% normal, 47% low grade, 11% high grade and tested negative for HPV. Clinically, any patient showing any sign of a high-grade lesion should be examined further. This classification shows a further nine patients, of which five have high-risk HPV including two positive for HPV16 (a high oncogenic risk).
https://static-content.springer.com/image/art%3A10.1007%2Fs00216-010-4179-5/MediaObjects/216_2010_4179_Fig3_HTML.gif
Fig. 3

Average classification rates for each classifier when 1–25 set B patients are used in training; the remaining B data are tested to generate the classification rate. a Results from k-NN (black dotted line), SVM (grey short-dashed line), ANN (light grey long-dashed line) and eClass (black solid line). b Normalised average accumulated training times for eClass (black solid line), SVM (grey short-dashed line) and ANN (light grey long-dashed line). The accumulated training time for a classifier is the average total time spent in training the classifier when up to p B patients are available. The normalisation was accomplished by dividing each classifier’s average accumulated training time by its initial training time (the time that corresponds to all A + 1 B patient). k-NN does not appear in the figure because it has no training time (see “Classifier times” and "Comparison of eClass to alternative classifiers")

Table 3

Final eClass predictions (%) of unknown spectra per patient in set B designated as either normal, low-grade or high-grade

Patient (set B)

Predictions (%) of unknown spectra

HPV status

Age (years)

Normal

Low-grade

High-grade

B-1

6

93

1

Low-risk

42

B-2

9

91

0

Negative

44

B-3

22

78

0

Unknown

45

B-4

2

98

0

Negative

35

B-5

19

76

5

Low-risk

30

B-6

1

99

0

Negative

25

B-7

24

76

0

High-risk

38

B-8

18

82

0

High-risk

27

B-9

0

100

0

High-risk

28

B-10

3

97

0

Negative

26

B-11

0

100

0

Unknown

36

B-12

8

90

2

High-risk

39

B-13

42

47

11

Negative

33

B-14

23

77

0

High-risk

30

B-15

26

68

5

Low-risk

22

B-16

2

98

0

High-risk

30

B-17

14

86

0

Probable high-risk

20

B-18

6

94

0

High-risk

36

B-19

22

76

2

High-risk

37

B-20

2

98

0

High-risk

34

B-21

25

74

1

High-risk

43

B-22

13

84

3

High-risk

43

B-23

0

100

0

Low-risk

52

B-24

6

92

2

High-risk

39

B-25

3

97

0

Low-risk

40

B-26

0

100

0

High-risk

24

B-27

1

99

0

Negative

34

B-28

0

98

2

Negative

21

B-29

16

84

0

Unknown

27

B-30

7

93

0

High-risk

21

Average

11

88

1

na

na

For the overall (average) classification of set B (previously designated as low-grade using conventional cytological screening), prediction as an unknown resulted in 88% low-grade, 11% normal and 1% high-grade classification. The training consisted of set A + 29 patients from set B amalgamated over 2,000 combinations within the training set. HPV status was designated as negative, low-risk or high-risk for the presence of HPV genotype infection

na not applicable

One fuzzy rule has been extracted from a classifier model to show a real-world example of the structure presented in Eq. 1 (Fig. 4): It simply provides an overall membership to one class following ‘THEN class is…’.
https://static-content.springer.com/image/art%3A10.1007%2Fs00216-010-4179-5/MediaObjects/216_2010_4179_Fig4_HTML.gif
Fig. 4

An example of a fuzzy rule. A number of fuzzy rules, per class, are formed during the training stage of eClass to form a predictive model. The final line of the rule provides the membership to a particular class for which the rule exists

Alternative classifiers

Several different parameter configurations were tried for each alternative classifier in order to optimise these parameters based on the spectral data. The parameters used hereafter correspond to the highest results produced by the respective classifier. The initial classification rates using only set A in training are poor in comparison to training using a combination of set A and set B data. This is the case for all classifiers tested here (Table 1).

In comparison to other classifiers k-NN is stable (classification rate has low variance for a given p; Fig. 3a). The results for ANN are particularly unstable and the overall classification rate is lower than the other classifiers. In comparison to the other classifiers, SVM appears superior and has the highest classification rate. In addition, the classification rate increases steadily with the addition of set B patients to the training set. However, upon closer inspection, the computational complexity is inefficient: The number of SVs was between 80% and 84% of the total number of IR spectra in the training set, an indication of possible overfitting.

Classifier times

The accumulated training times were recorded and plotted for comparison (Fig. 3b). The eClass training time increases gradually, whilst ANN and SVM must retrain upon the addition of new information (spectral data). Therefore, the accumulated training time for ANN and SVM increases dramatically following the addition of 25 set B patients to the training set. k-NN is not displayed as it does not have a training stage.

Discussion

A high-throughput instrument combined with a predictive tool is required for cervical cytology development and improved interpretation. eClass can correctly distinguish normal, low-grade and high-grade pre-cancerous cervical lesions [17]. Furthermore, interpretation of cases where IR spectra are incorrectly classified could potentially indicate the regression or progression pattern of low-grade cytology, effectively subdividing the class, i.e., a dichotomous biomarker. Determining which patients will progress to develop invasive carcinoma is not currently possible, but would be of huge clinical importance in the follow-up of women with low-grade cervical cytology.

There are spectral differences between set A and set B data, with further differences between both the set A low-grade and set B spectra (Fig. 1). Multiple factors may account for this, including the country of origin and, as such, laboratory/sample collection techniques and grading from different cytologists; additionally, inter-individual confounding factors due to lifestyle and diet would be expected to play a role. The difference observed in the spectra and distribution of the datasets is a real-world situation. The ability of a classifier to account for this is vital for the development of a high-throughput accurate screening programme. Although more could be done, e.g., ensure grading guidelines are identical to ensure the samples are treated identically, this result is very encouraging. The low classification rates when trained with only set A data suggest the need for an adaptive or complex model.

The potential of this technique as a future diagnostic tool relies on the adaptive abilities of the classifier; if a new sample population were introduced, a small amount of data (graded by a cytologist) could be added to the training set and the remaining patients predicted with confidence (≈88%). Furthermore, this algorithm and approach can be adapted for other biological applications where prediction using large volumes of data is required. This application of eClass to categorise IR spectra is a unique approach able to evolve from new data whilst remaining robust. Furthermore, the potential of low-grade samples to be subdivided based on progression has been explored.

Class

The initial low classification rate, i.e., with none/small number of patients from set B in training, endorses differences between set A and set B. The incremental addition of set B patients to the training set generates a reliable high classification rate. The adaptive characteristic of eClass is demonstrated by the changing set of fuzzy rules and the classification rate. The ability of the classifier to evolve is essential for its transferability into a clinical real-world setting. The need to rerun the algorithm from the beginning with each addition of new data would be computationally inefficient and time-consuming. The structure of the set A and set B data is largely the same, and only small updates to the classifier are required. The three two-class classifiers (CNL and CLH) and CNLH were found to be very similar in the case where ten patients from set B were used in training. The CNLH is less complex to compute, easier to interpret than the three two-class cascade classifier and therefore more applicable for future applications. The CNLH classification rate has remained stable (≈82% when p = 25) with the increase in training data. Low-grade cervical samples are hard to distinguish as they share similar characteristics with both normal and high-grade samples.

We tentatively suggest that low-grade cervical cytology can be subdivided by ATR-FTIR spectroscopy coupled with eClass with the biological progression that exists within low-grade samples being detected by eClass. At certain stages, cytological results produce the same diagnosis/answer as eClass. However, earlier prediction could be possible with further monitoring in conjunction with eClass to determine if a patient will progress/regress. This could be achievable before cellular changes are observable by a cytologist. By comparing the ratio of normal/low-grade/high-grade classification per patient, we suggest that eClass can predict the regression/progression of the patient. In particular, patient B-13 is predicted as 42% normal/47% low-grade/11% high-grade as this patient tested negative for HPV. A further nine patients had at least a 1% high-grade prediction, which clinically would lead them to be further examined. Of these, five tested positive for a high-risk HPV, including two positive for HPV16. This is a known oncogenic HPV type, and the theory that these patients could be progressing to high-grade is viable. There are 12 patients with a classification suggestive of regression to normal.

The observation of the proportion of set B predictions across the three classes may lead to a clear understanding of the class and progression patterns of individual patients. This is a unique advantage in the classification of low-grade cervical cytology. This novel application is different to the traditional cervical cytology approach and brings several advantages including adaptability, robustness, further information relating to the biological changes occurring and the ability to classify large real-world data in a high-throughput manner.

Comparison of eClass to alternative classifiers

A comparison of eClass to alternative classifiers explores the criteria of a classifier for this application. Although SVM has the highest classification rate, its computational complexity, user-specific and problem-specific settings, potential for overfitting and large memory requirements vastly decrease its potential in this application. ANN is complex, unstable and exhibited the lowest classification rate. k-NN produced results similar to eClass, with consistently high classification rates. However, whilst being user-friendly and simple could be assets of a classifier, in this instance, a lack of structure and therefore interpretability is a major flaw. eClass has a linguistic and visual interpretation with constant extractable information of the structure, enhancing understanding of the dataset [17].

Overall, whilst eClass does not have the highest classification rates, the characteristics of being adaptive, automatic (no parameters to be user-tuned) and interpretable ultimately outweigh the alternative classifiers for use in this application. The eClass accumulated training time increases gradually as data from set B is incrementally added to the training set where it adapts and evolves from the new data, and finally, it needs only 37% more than its initial training time to comprehend the information from the additional 25 patients in its structure. The alternative classifiers need to be retrained for each new data point added, resulting in significant time overhead when a few new samples become available. In fact, the need for retraining is a severe limiting factor for a clinical setting as this retraining will take more and more time as the dataset increases in size. Therefore, classifier retraining time tends to infinity with dataset growth, whereas eClass incorporates new data taking an amount of time that is only proportional to the number of new spectra. k-NN is absent from Fig. 3b for it has no training stage. However, k-NN classification time for each spectrum increases proportionally to n × log2n, where n is the number of spectra in the training set.

Conclusions

eClass applied to IR spectroscopy data is a novel approach. Here, we have proved that it is able to produce reliable and informative results and incorporate data of a similar nature from different environments. Although this approach has potential, there are still improvements which could be incorporated in future classifiers. For instance, the number of features employed in the classifier could be further reduced by considering grouping together neighbouring band positions with similar absorption values. This would produce a simpler model producing a higher level of interpretability. It is also a realistic way to handle the IR absorption spectra of biochemicals as the band positions are not completely independent, i.e., limited to one chemical bond. This, seen also in Fig. 1, has the potential to improve the classification rate further by employing this grouping method. The final test will be to validate this potential with larger cervical datasets from different origins then allow the classifier to simultaneously classify the spectra. Following prediction with eClass, one cytologist should grade the samples to limit variation from many individuals.

The application of eClass to IR spectral data of exfoliative cervical cytology has considerable potential in a clinical setting [34]. Crucially, as well as providing the classification of each spectrum, one may determine the ratio per patient of normal/low-grade/high-grade to identify cytological changes within each sample. This information is potentially related to the progression/regression patterns of each patient and is particularly important as a distinction of patients within the low-grade category likely to progress has not previously been achieved in any context.

Acknowledgements

This work was sponsored by the Rosemere Cancer Foundation.

Copyright information

© Springer-Verlag 2010