1 Introduction

Oral carcinoma is an important global health problem, as evidenced by approximately 40,250 new cases of oral and throat carcinoma detected in 2012 (American Cancer Society). In the United States, the overall 5-year survival rate for all stages of oral carcinoma is 61 % [1, 2]. Biopsy is the conventional means of diagnosing oral carcinoma [3, 4]. In addition to allowing pathologists to diagnose the stage of carcinoma accurately, biopsy also allows them to prescribe the most appropriate treatment. However, differences in experience and subjectivity when evaluating borderline dysplastic cells between pathologists might affect their diagnostic accuracy. A microscopic hyperspectral imaging system (MHIS) capable of presenting a tissue image and the spectral information of each pixel in the image simultaneously was developed to facilitate carcinoma diagnosis quantitatively [516]. However, both the hardware and analytical algorithm aspects of the conventional MHIS required further improvement. Regarding hardware, the conventional MHIS was time-consuming, had a complex mechanical structure, high off-axial optical aberration, and inconvenient alignment. The embedded relay lens MHIS (ERL-MHIS, Fig. 1) developed in our previous works overcomes these limitations [17, 18].

Fig. 1
figure 1

a Diagram and b photograph of ERL-MHIS (RL relay lens, HS hyperspectrometer, SM stepping motor, IMP1 imaging plane 1, IMP2 imaging plane 2, FW fluorescent wheel

Certain limitations of analytical algorithms have prevented widespread MHIS application in carcinoma diagnosis [1115]. Siddiqi et al. [13] distinguished between normal and cancerous cells by using the nuclear spectrum and then further distinguished them based on the nuclear/cytoplasmic ratio. Other works [11, 12, 14, 15] demonstrated that the nuclear spectrum can present quantitative differences between normal, precancerous, and cancer cells. However, there are several problems with these studies. First, these works used biopsies from only one patient. Although a 40 × objective was used to examine every single cell in the biopsy, their conclusions may not be applicable to all cancer patients. Second, the examination of every single cell in the biopsy was quite time-consuming. Third, most pathologists use a 20 × objective [19]. Accordingly, the use of a high-power objective (higher than 40×) is inconvenient for pathologists. Fourth, in a previous work, the sensitivity, specificity, or both of discrimination between normal and precancerous cells was about 70–80 % [13].

This work develops morphological and spectral methods and uses them to help pathologists quantitatively diagnose oral carcinoma biopsy. The proposed methods were used to diagnose 68 oral carcinoma biopsies of 34 patients using ERL-MHIS (with a 20 × objective). In the spatial domain, although the fractal dimension algorithm [20] can distinguish between the morphological differences of normal and abnormal tissues, an abnormal tissue does not represent a cancerous tissue. Hence, normal and cancerous cells in the tissue are distinguished in the spectral domain using five methods. However, the spectra of normal and cancerous cells vary with patient due to differences in sampling conditions (e.g., age of patients, lesion site, tumor size, and lymph node metastasis) [15, 21, 22]. Therefore, this work develops a novel cocktail approach to reduce the difference in the cell spectrum between patients. The proposed cocktail approach determines the effectiveness of spectral methods in correlating with the sampling conditions. The sample is then diagnosed using the optimal combination of effective spectral methods.

2 Materials and Methods

2.1 ERL-MHIS

Figure 1(a) illustrates the functions and setup of the ERL-MHIS. The designed relay lens for scanning is placed between the microscope and the hyperspectrometer. The stepping motor is located under the relay lens. The relay lens comprises symmetric infinite conjugate lenses for scanning and transferring images with optimal off-axis optical aberration. The object on the platform is imaged with the objective lens on imaging plane 1. The relay lens then transfers the image from imaging plane 1 to imaging plane 2, where the hyperspectrometer slit is located. The slit is along the y axis direction. Imaging plane 2 images one line (slit size) at a time on the electron-multiplying charged-coupled device (EMCCD). When the relay lens is static, the line image of slit size and the spectrum can be recorded on the EMCCD. While the stepping motor moves along the x-axis, the individual line images are recorded on the y–λ plane of the EMCCD. The stepping motor moves one step in the x-axis direction to capture the next line image and its spectrum. Each y–λ image is recorded as a single y–λ file for each row along the object corresponding to the radiation collection region, which maps through the hyperspectrometer to the EMCCD. After all of the line images are captured, the data cube of all of the y–λ files are loaded into memory.

The ERL-MHIS provides transmission and fluorescence images of the biopsy to assist pathologists in detecting cancerous cells or tissues. The transmission light source is a 100-W halogen lamp. The fluorescent light source is a 75-W xenon lamp. The transmission image provides morphological information and spectral information from 400 to 1000 nm of the cell or tissue. The fluorescence image provides the characteristic spectrum of the cell or tissue. The proposed system has two fluorescence modes (F1: 330–385 nm; F2: 470–490 nm). The fluorescence mode can be changed by tuning the fluorescent wheel. Figure 1(b) shows the finished product of the proposed ERL-MHIS, which consists of an inverted microscope (Olympus IX71), charged-coupled device (CCD; AVT PIKE F-421-C), RL, stepping motor (Sigma Koki, SGSP20-20), hyperspectrometer (Specim V10E, with spectral range of 400 to 1000 nm), and EMCCD (Andor Luca R604, with 1000 × 1000 pixels and 8-µm pixel size). The spatial resolution of the ERL-MHIS is 30 µm × 10 µm. The objective power affects the spatial resolution. This work uses a 20 × objective (spatial resolution: 1.5 µm × 0.5 µm). The spectral resolution of the ERL-MHIS is about 2.8 nm. The software for acquiring images and analyzing spectral information was programmed in the C language. The software controls the speed of the stepping motor, gain, and exposure time of the EMCCD.

2.2 Patient Biopsy Preparations

Sixty eight biopsies of 34 oral carcinoma patients were provided by the China Medical University Hospital (Tai-Chung City, Taiwan). The 68 biopsies were divided into 58 training cases and 10 test cases. The type of test biopsy (normal or cancerous) was not revealed to the analyst who analyzed the 10 biopsies. The 58 training cases were utilized to determine the most effective methods for each sampling condition and its cut-off point. The performance of the proposed approach was validated using the 10 test cases.

Before the experiment, institutional review board (IRB) approval was obtained from China Medical University Hospital (IRB number DMR98-IRB-209). All patients received complete information on this experiment before providing their signed informed consent. This study was implemented in accordance with the Declaration of Helsinki.

The routine pathological diagnosis procedure with hematoxylin and eosin (H&E) staining was utilized to prepare the biopsies. After surgical operations, the oral carcinoma and normal samples were resected from the patients. Samples were then stained with H&E. Next, two biopsies (normal and cancerous) were prepared from each patient. The pathologist identified the biopsies as either normal or cancerous. Moreover, the pathologist marked the layers of the oral tissue (lamina propria and basal-cell layer) on the image of the biopsies. For each biopsy, one transmission image and two fluorescence images were acquired using the ERL-MHIS.

2.3 Morphology-Based Fractal Dimension Method for Tissue Discrimination

Fractal dimension is a ratio that gives statistics of complexity, comparing how much detail in a pattern changes with the scale at which it is measured [20]. In this work, the complexity of the border between the lamina propria and the basal-cell layer is represented using the fractal dimension of the tissue image. Fractal dimension can be calculated by taking the limit of the quotient of the log change in object size and the log change in measurement scale, as the measurement scale approaches zero:

$$D = \frac{\log N}{\log s},$$
(1)

where D denotes the fractal dimension, s denotes the length of the chosen smallest unit, and N denotes the number of s required to cover the pattern. In this work, the nuclear size with nine pixels is used as the smallest unit. N is the image size (1000 × 1000 pixels). The fractal dimension is calculated using the binary image version of the transmission image.

The raw data of the transmission image must be calibrated before the fractal dimension is calculated. In this work, the dark noise of the system is removed using a dark image with no illumination. The nonuniformity of the transmission image is then removed using a reference blank, for which an area on the slide is scanned with all layers of glass besides the cell structures. The nonuniformity is caused by uneven illumination, scan line stripping, effect of the lamp, and reflectance or transmittance of glass. The calibration formula is:

$$T(\lambda ) = \frac{{I_{T} (\lambda ) - I_{dark} (\lambda )}}{{I_{white} (\lambda ) - I_{dark} (\lambda )}},$$
(2)

where T(λ) denotes the calculated transmittance value of each pixel in the transmission image, I T (λ) denotes the spectral intensity of raw data for each pixel in the transmission image, I dark (λ) denotes the spectral intensity of each pixel in the dark field, and I white (λ) denotes the spectral intensity of each pixel in the bright field.

Additionally, an attempt was made to acquire a binary image with a clear border between the basal-cell layer and the lamina propria by superimposing the transmission image in the wavelength range of 500–700 nm. The largest difference of spectral intensity between the basal-cell layer and the lamina propria is in the wavelength range. Finally, the fractal dimension of the binary image is calculated using Eq. (1).

2.4 Five Spectrum-Based Methods for Cell Discrimination

When the oral dysplasia arises from the epithelial tissue, the number of nuclei in the basal-cell layer increases and the nuclear shape and size change [3, 4]. Moreover, according to previous studies [1315], the nuclear spectrum presents a quantitative difference between normal, precancerous, and cancerous cells. Therefore, it is hypothesized that normal and cancerous cells differ in the nuclear spectrum of the basal-cell layer of epithelial tissue. In this work, the analyzed spectral data are obtained from two fluorescence images (obtained with F1 and F2 excitation, respectively). We choose the nuclei with well dyed and exclude the nuclei of border. Before analysis, the dark noise must be removed. The calibration formula is F(λ) = I F (λ)I dark (λ), where F(λ) denotes the fluorescent emission intensity of each pixel in the fluorescence image, I F (λ) denotes the spectral intensity of raw data of each pixel in the fluorescence image, and I dark (λ) denotes the spectral intensity of each pixel in the dark field. After the noise is removed, all of the nuclei in the basal-cell layer are chosen. Each nucleus comprises nine pixels. For each nucleus, the fluorescent emission spectral intensity is the average spectral intensity of the nine pixels.

According to the characteristics of the emission spectral shape, normal and cancerous cells were distinguished using five methods (three methods for spectrum obtained with F1 excitation and two methods for spectrum obtained with F2 excitation). The emission spectral shape for F1 excitation has two peaks and one valley (Fig. 3a); peak 1 at 560 nm, peak 2 at 705 nm, and valley at 630 nm). Since the normal cell and cancerous cell had different peak ratios or different valley values, the first method uses the peak ratio (PR), peak 1/peak 2, as a characteristic of spectral shape. The second method uses the peak and valley ratio (PVR), with the spectrum normalized by the intensity of peak 1. Then, the formula (peak1 × peak2)/valley is used as a characteristic value of spectral shape. The third method uses the area under the spectral curve (AUS1) normalized by the intensity of peak 1 as a characteristic of spectral shape. The emission spectral shape for F2 excitation has only one peak (Fig. 3b); peak at 560 nm). The fourth method uses the area under the spectral curve (AUS2) normalized by the intensity of peak as a characteristic of spectral shape. The fifth method uses the full width at half maximum (FWHM) of the spectral curve as a characteristic of spectral shape.

Each cell can obtain one characteristic value from each method. For each patient (two biopsies) and each method, the characteristic values of normal cells and cancerous cells were plotted as two distribution groups. The cut-off point and optimal performance (sensitivity and specificity) of each method were determined using the receiver operating characteristic (ROC) curve [23]. Sensitivity and specificity measure the inherent validity of a diagnostic method for dichotomous results [24, 25]. Details on how to calculate of sensitivity and specificity can be found elsewhere [25]. Sensitivity measures the proportion of actual positives that are correctly identified as positive, that is, the percentage of cancerous cells that are correctly identified. Specificity measures the proportion of negatives that are correctly identified, that is, the percentage of normal cells correctly identified. However, specificity and sensitivity rely on the cut-off point utilized to define “positive” and “negative” test results. Notably, sensitivity and specificity shift when the cut-off point shifts. The ROC curve delineates the trade-off between the sensitivity and (1-specificity) across a series of cut-off points [24].

2.5 Spectrum-Based Cocktail Approach for Cell Discrimination

This section describes how the cocktail approach determines which spectral methods are effective for each patient. Previous works [15, 21, 22] have established that the lesion site, tumor size, age, and lymph node metastasis affected the cell spectrum. Therefore, in this work, patient data were sorted according to the age, lesion site, size or direct extent of the primary tumor (T), and degree of spread to regional lymph nodes (N). Here, a method with a mean sensitivity and a mean specificity of higher than 80 % was defined as effective with respect to a specific condition. For example, the ages of three patients (patients 18, 25, and 28) ranged from 30 to 39 years old. The mean sensitivity and mean specificity of the PR method for the three patients were 76.8 and 73.46 %, respectively (Table 2). Hence, the PR method is infeasible for patients in this age range. In contrast, the mean sensitivity and mean specificity of the PVR method were 91.43 and 83.6 %, respectively, for these patients, and thus the method was considered effective. Once the most effective methods were determined for each condition, their combination was used to diagnose a biopsy, depending on its conditions. The cell must undergo screening of all of the effective methods of the sample, followed by its diagnosis as a normal or cancerous cell. Figure 4 (in the Sect. 3.2) shows the flowchart that describes how the cocktail approach finds the effective methods for each condition.

2.6 Combined Diagnosis Based on Fractal Dimension and the Cocktail Approach

The fractal dimension was first used to determine whether the biopsy tissue was normal or abnormal. The cocktail approach can further determine whether the cells were normal or cancerous. The biopsy was thus determined as normal or cancerous.

3 Results and Discussion

3.1 Morphological Identification Between Normal and Abnormal Tissues

Table 1 shows information on the biopsies. Figure 2 displays the biopsy image of patient 7. Pathologists can obtain more information about the biopsy (e.g., the differentiation and the stage) from the transmission images (Fig. 2a, e).

Table 1 Patient information
Fig. 2
figure 2

Biopsy image of patient 7. a Transmission image of normal tissue. Fluorescence images of normal tissue under (b) F1 and (c) F2 excitation. d Binary image of normal tissue for calculating fraction dimension. e Transmission image of cancerous tissue. Fluorescence images of cancerous tissue under (f) F1 and (g) F2 excitation. h Binary image of cancerous tissue for calculating fraction dimension

In this work, the complexity of the border between the basal-cell layer and the lamina propria was represented using the fractal dimension (Fig. 2a). Normal tissues had a clear border between the basal-cell layer and the lamina propria. In abnormal tissues, the cells eroded other cells, leading to a disordered border [4]. Figures 2d and 2(h) show the binary images. The white area of the binary image represents the border. Although the white area in the normal tissue is a continuous curve, the white area in the abnormal tissue is made up of discontinuous curves and spread over the entire image. Therefore, in the binary image, the border of abnormal tissue was more complex than that normal tissue. When the binary image contained a large number of white areas, the fractal dimension of the binary image was high. Hence, the fractal dimension of abnormal tissue was higher than that of normal tissue. Closely examining column 1 of Table 2 reveals that the criterion of fractal dimension for discrimination between normal and abnormal tissues is 1.73; below this value, the tissue was diagnosed as normal; otherwise, the tissue was diagnosed as abnormal. Notably, the chosen nuclei were well dyed and circle. Besides, when the objective power was altered, the fractal dimension changed. The change of fractal dimension was relative to the border complexity of magnified tissue. Moreover, the fractal dimension was high when the binary image of magnified tissue included a significant number of white areas.

Table 2 Fractal dimension value of normal and cancer tissues

3.2 Spectral Identification Between Normal and Cancerous Cells

Since the ERL-MHIS provides the fluorescence spectrum of cell nuclei from the fluorescence images (Fig. 2b, c, f, and g), this work attempted to determine whether there was any spectrum-based difference between normal and cancerous cells in the basal-cell layer. Previous works [1113] have established that the nuclear spectra at normal, precancerous, and cancerous stages are different. Hence, in this work, the nuclear spectrum was used to represent the spectrum of each cell. Figure 3 displays the typical cell spectra of various cancer stages, as obtained by the ERL-MHIS. Under F1 excitation (330–385 nm), the shape of the fluorescent emission spectrum had two peaks and one valley. Peak 1 was located at 560 nm and peak 2 was located at 705 nm. The valley was located at 630 nm. When the peak 1 intensity was normalized, the peak 2 intensity showed a difference between stages. Furthermore, the peak 2/peak 1 intensity ratio of normal cells was lower than that of cancerous cells. This finding is consistent with that of Roblyer et al. [26]. The peak 2 difference between normal and cancerous cells was due to the difference in porphyrin concentrations [27, 28]. Ramanujam et al. attributed the valley difference to cell metabolism. Under F2 excitation (470–490 nm), the fluorescent emission spectrum had only one peak, located at 560 nm. When the peak was normalized, the stages differed in the FWHM of the spectrum [27]. The difference can be used to monitor the changes of FAD concentration [29].

Fig. 3
figure 3

Difference of mean fluorescent emission spectrum of cells between normal and various oral cancer stages on tongue (patients 7, 12, 24, and 26). Mean fluorescence emission spectra of normal cells and cancer cells under (a) F1 and (b) F2 excitation

In order to distinguish between normal and cancerous cells in terms of spectral difference, the characteristics of spectral shape were described using five methods: PR, PVR, AUS1, AUS2, and FWHM. The performance of each method was evaluated based on the ROC curve. The cut-off point, sensitivity, and specificity were also determined (Table 2). Notably, under F1 excitation, although the normal cells of each patient had similar fluorescent emission shapes, the 29 patients differed in intensity of the two peaks or the valley. Under F2 excitation, these patients slightly differed in the FWHM of normal cells. The cancerous cells exhibited the same phenomenon under both excitations. Therefore, the diagnostic performances of the five spectral methods were not ideal. The methods were suitable for some patients, but not others. For example, the PR method yielded good results for patient 3, but not for patient 9. Hence, the five methods all showed high standard deviations.

To solve the difference of nuclear spectrum between patients, this work classified the 29 sample data according to lesion site, tumor size, age, and lymph node metastasis. Then, based on the proposed cocktail approach, the effectiveness of the spectral methods in correlating with the sampling conditions was determined. Figure 4 shows how the cocktail approach determines the most effective methods for each sampling condition. Table 3 lists the most effective methods for each sampling condition. These methods can be combined according to the sampling conditions to diagnose a sample. For example, the combination of methods AUS1 and FWHM was used to diagnose sample 1 (71 years old, lesion on tongue, T2, and N2).

Fig. 4
figure 4

Flow chart of combination approach determining most effective methods

Table 3 Correlation between effective methods and patient’s conditions

Before a cell is diagnosed using the most effective method, the optimal cut-off point of the method must be determined. The optimal cut-off point was defined as the mean of total cut-off points under a specific condition. For example, for patients 18, 25, and 28 (age: 30–39 years), PVR was the most effective method. The optimal cut-off point for PVR under this condition was the mean of the three patients’ PVR cut-off points (1.55, 1.23, and 1.23, in Table 2). Moreover, each effective method had different optimal cut-off points under different conditions, because each condition had a different patient group. For example, the patient groups of T1 and T2 were different. Therefore, the optimal cut-off points of AUS1 differed for T1 and T2.

3.3 Combined Diagnosis

The morphological changes of tissue were diagnosed using the fractal dimension. Although this method can diagnose the abnormal tissue, the abnormal tissue did not represent cancerous tissue. It may represent hyperplasia. In hyperplasia tissue, the number of cells increases and the border between the basal-cell layer and lamina propria becomes disordered [4]. Hence, in this work, the fractal dimension of hyperplasia tissue differed from that of normal tissue. Moreover, whether the cells were cancerous or normal was further confirmed using the cocktail approach.

Figure 5 compares the performances of all methods. Because each method was appropriate for only some patients, the standard deviation of each method was large. In contrast, the cocktail approach showed a high mean specificity, high mean sensitivity, and small standard deviation, implying its better correlation with sample data. In addition, the fourteen patients with early stage oral carcinoma were successfully diagnosed with a sensitivity of 90 ± 4.65 % and a specificity of 87.2 ± 5.06 %. Moreover, 10 test samples were utilized to validate the training results (Table 3). Note that the type of sample (normal or cancerous) for the 10 test samples was not revealed to the analyst during their analysis of the samples. The 10 test samples were diagnosed according to the patient’s conditions. The sensitivity and specificity of the 10 test samples were 80.16 ± 4.5 % and 81.74 ± 2.26 %, respectively. Notably, the testing nuclei were chosen from well dyed cells and cells away border. The results can be enhanced by the correct patient’s condition. The concentration and the time of the H&E would be the key conditions.

Fig. 5
figure 5

Comparison of performance of all methods. (a) Mean sensitivity and (b) mean specificity of 29 patients for each method

In clinical application, the proposed approach can help pathologist quantitatively diagnose biopsies. The procedure of sample preparation and examination is the same as that for pathological examination in a clinic. The procedure of the diagnostic approach is controlled using programming. Therefore, the approach is convenient for pathologists. In addition, this is the first time that the ERL-MHIS was utilized to diagnose oral carcinoma biopsy. This study also provides a categorical approach of cellular spectrum to enhance diagnosis performance for clinical oral carcinoma research. The cut-off point of each method can be used as reference data by researchers. Moreover, this study proves that the spectrum of oral carcinoma cell relates to not only the cancer stage but also the patient’s conditions.

4 Conclusion

This work developed morphological and spectral methods and then combined them to help pathologists diagnose oral carcinoma biopsy quantitatively. 68 biopsies of 34 oral carcinoma patients were diagnosed based on the ERL-MHIS. This is the first work to apply the ERL-MHIS to the cytopathological examination of oral carcinoma. The fractal dimension algorithm is applied to discriminate between normal and abnormal tissues in terms of morphological differences. For the spectral discrimination, normal and cancerous cells are distinguished using five methods. The spectra of normal and cancerous cells vary with patient. The diagnostic results of the five methods are thus not ideal. Therefore, the proposed cocktail approach is utilized to determine the effectiveness of spectral methods in correlating with the patient’s conditions. A combination of effective spectral methods that depends on the patient’s conditions is then used for diagnosing a biopsy. In addition to promoting the mean sensitivity and mean specificity, the proposed cocktail approach reduces the standard deviation. Moreover, this study successfully diagnosed oral carcinoma in its early stage. In the future, the k-nearest neighbor method or principle component analysis method will be used for finding characteristic molecules of different carcinoma stages. Furthermore, light-emitting diodes can be used as the light source of the ERL-MHIS to reduce scanning time.