Keywords

1 Introduction

The present work is situated at the intersection of three significant innovation trends and looks to exploit the different opportunities they offer and propose a solution for direct pathogen identification on bacteria culturing plates. At first, the concrete possibility of (deeply) learning salient visual features determined, in recent years, the success of Deep Learning (DL) architectures. For visual recognition tasks, DL models are normally implemented with Convolutional Neural Network (CNN) [1] for low-to-high level visual feature learning. This is currently influencing, if not significantly impacting, several application domains. In the biomedical field a transition from handcrafted to learned feature-based approaches can bring significant benefits, especially when high data throughput and visual content variability are involved [2]. However, data dimensionality (biomedical data are often 3D or higher dimensional) introduces further challenges for DL solutions only partially addressed so far.

The second trend we consider, is the increasing attention on small-scale applications of hyperspectral imaging (HSI) in several domains, such as industrial quality controls (especially food, pharma and chemical [3, 4]), cultural heritage preservation [5] and a number of biomedical applications [6]. What currently contributes to the proliferation and diversification of small-scale HSI applications, in addition to the classical Remote Sensing (RS) ones, is the increasing technological variety and ever lower cost of acquisition equipment [7, 8] and, as for DL, the continued increase in computational power and storage/transmission capabilities of computing hardware and networks. In many situations, where visual analysis is limited by spatial-spectral resolution trade-offs, the alternative or concurrent use of HSI acquisition systems can play a determinant role for improved data interpretation. However, the restricted number of non-RS available datasets still hinder the popularity of HSI data acquisition and analysis research for non-RS applications.

The third evolution we consider defines the application context of our work. This is related to a recent digitization trend significantly impacting the field of Clinical Microbiology (CM), the escalating diffusion of Full Laboratory Automation (FLA). An FLA system is capable of handling all phases of bacterial colony culturing, from the processing of various human collected specimens through seeding and streaking on culturing plates (Petri dishes), to automatic incubation and further processing for subsequent analysis [9, 10]. All relevant phases of bacteria colony growing can be captured by digital cameras, visualized on diagnostic workstations, stored/communicated and processed. This determined the advent of Digital Microbiology [11, 12] and a fundamentally new way of work for microbiologists.

In Digital Microbiology Imaging (DMI), image-based decision making can be automated for certain tasks or support the work of the microbiologist for others. One of the most impacting capabilities (not yet provided by commercial products) would be reliable and fast identification of bacterial species by direct image analysis and machine learning solutions. Early identification of bacteria species is needed to determine the correct therapy for the patient with potentially significant impact on life expectation. In addition, early identification is one of the most powerful ways to contrast the worldwide threat related to antibiotic resistance. This is especially true if one considers very general and massive diagnostic investigation procedures such as screening for urinary tract infection (UTI) pathogen identification [13]. UTI are widespread and serious health problems that interest many millions of people every year around the world, accounting for a significant part of CM labs’ workload [14]. Unfortunately, presumptive identification by visual inspection of UTI pathogens on the most diffused culturing media (e.g. blood agar) can be a very complex and ambiguous task, even for highly skilled microbiologists (examples of different pathogens exhibiting high visual similarity are showcased in Fig. 1). This is the reason why, despite their higher cost, chromogenic media [15] have gained widespread market diffusion, thanks to their ability to mark different colonies with different colors (through the use of pathogen-specific enzyme substrates). However, these media have several limitations in terms of the number of pathogens that can be differentiated [16]. HSI technology could provide support where three-chromatic imaging does not give enough spectral information for reliable discrimination. Therefore UTI identification is a good case study for HSI-based bacteria identification because UTI represents a diagnostic context involving, for a single laboratory, hundreds of analyses per day, so a technology investment can be rapidly amortized.

Fig. 1.
figure 1

Examples of different UTI bacteria colonies grown on blood agar media.

There are still very few examples of DL-based approaches for RS applications [17,18,19,20] and, to our knowledge, still none for other fields, including biomedicine. Moreover, though both conventional machine learning  [21,22,23] and DL solutions [24] have already been implemented for DMI analysis tasks, and hyperspectral classification has already been explored in CM  [25,26,27,28,29], the present work is the first attempt to combine HSI, CM-FLA and CNN for direct bacterial identification purposes. In this work, we want to exploit the enhanced spectral information coming from HSI acquisitions to prove the feasibility of reliable bacteria species discrimination based on a DL approach. We raise the complexity of the problem compared to our preliminary study [27] by increasing the cardinality of pathogens, building a larger HSI UTI dataset (made available online) and by exploiting an improved acquisition setup. Unlike typical Remote Sensing techniques, that seek to increase the spatial consistency of the spectral classification at a pixel level [30], our pathogen recognition takes place on each single bacterial colony growing on the agar substrate. To this end, we propose a new (with respect to [27]) spatial-spectral distance measure to extract Colony Spectral Signatures (CSS). We designed and trained a 1D-CNN acting on CSS for pathogen identification and compared it against other conventional machine learning approaches, well selected and designed for the same purpose [29]. In particular, classification accuracy, computational efficiency and scalability comparisons are proposed along with examples and further considerations.

2 Proposed Method

A general scheme of the proposed HSI processing and classification workflow for rapid UTI bacteria discrimination is given in Fig. 2. In describing the various stages of our system, we give more emphasis on the novel CNN-based solution for CSS discrimination. Details about other parts, HSI database and conventional handcrafted feature-based classification solutions can be found in [29].

Fig. 2.
figure 2

Processing and classification pipeline.

Hyperspectral Acquisition System. The HSI target is a 90 mm diameter Petri plate. The main parts of the acquisition system are: (1) HSI camera – a linear VNIR camera (Specim Spectral Camera V10E) with spectral range between 400 and 1000 nm, tele-centric fore lenses (Specim OLE23, focal length 23 mm); spatial resolution has been doubled with respect to [27] (640\(\,\times \,\)600 pixels) maintaining scanning time under 15 seconds (compatible with FLA needs). (2) Illumination system – the light of two 150W halogen lamps is conducted by two 13 mm-diameter optical fibers, spread by cylindrical lenses and finally reflected to the inner side of a semi-cylindrical dome. This configuration avoids total reflection effects on translucent colonies. (3) Conveyor system – a conveyor sliding system, mounting a shuttle which accommodates both the plate and a calibration bar (coated with \(BaSO_4\) optopolymer), allows push-broom plate acquisition and a per-sample radiometric calibration.

Colony Spectral Signature (CSS) Extraction. Flat-field calibration was applied to the hypercube to derive a normalized (with respect to a white calibration bar) relative reflectance measure \(R_{i,\lambda }\)

$$\begin{aligned} R_{i,\lambda } = \frac{S_{i,\lambda }-D_{i,\lambda }}{W_{i,\lambda }-D_{i,\lambda }} \end{aligned}$$
(1)

where \(S_{i,\lambda }\) is the acquired reflectance, \(W_{i,\lambda }\) and \(D_{i,\lambda }\) are the white calibration and the dark current spatial(i)-spectral(\(\lambda \)) profiles. A signal preserving Savitzky-Golay [31] denoising (window size of 7) is then applied. Since illumination power from the halogen sources decreases at the spectrum extrema, corresponding bands were cut off, preserving the ones with highest SNR in the range from 430 nm to 780 nm (for a total of 125 spectral bands). Then, a threshold-based foreground extraction is performed on the spectral band at wavelength 520 nm. This produces a reliable isolation of the grown colonies because, at this specific wavelength, the contrast between relative reflectances of pathogens and blood agar is greater than in other bands. At this point, spatial distance transform is calculated for each colony using a spectral cosine distance map, computed as:

$$\begin{aligned} 1 - \frac{\mathbf {u} \cdot \mathbf {v}}{{||\mathbf {u}||}_{2}{||\mathbf {v}||}_{2}}, \end{aligned}$$
(2)

between each pixel signature \(\mathbf {v}\) and the agar footprint \(\mathbf {u}\), obtained by averaging the spectral signatures of background pixels. We use the resultant map as an elevation map for a reliable watershed segmentation of bacterial colonies. For each detected colony we then extract a representative spectral signature (a 125-dimension vector) where we set colony pixel weighting factors proportional to the previously computed cosine distance map:

$$\begin{aligned} \mathbf {CSS_{colony}} = {\sum \limits _{p \in P} w_p \cdot \mathbf {R}_p} \quad \quad \in \quad \mathbb {R}^{125} \end{aligned}$$
(3)

with P the set of colony pixels, \(w_p\) the weighting factor for the pixel p and \(\mathbf {R}_p\) the relative reflectance spectrum of the pixel p. Representative CSSs for each pathogen (the list is given in Sect. 3) are shown in Fig. 3 (left), along with their standard deviation (shadowed).

Fig. 3.
figure 3

Average spectral signatures of UTI bacteria, their standard deviations and CNN structure selected.

Classification Methods. CNN architectures [1] have been related to models of the visual cortex [32] and are characterized by locally overlapped connections (receptive fields) and shared weights implemented within a stacked hierarchy (from low to high level visual tasks) of convolutional feature extraction layers alternated with pooling layers (usually exploiting a max pooling rule). This is followed by one or more fully connected classification layers. Non-linear activation function layers are typically employed following convolutional ones, and the whole network produces a differentiable score function allowing the network parameters to be learned (weights and biases of the convolutional and fully connected layers). Unlike spatial-spectral 3D-CNN configurations [33], which are more susceptible to overfitting and therefore needing dedicated regularization strategies, we exploit a 1D-CNN configuration similar to that considered in [33, 34] for RS hypercubes. However, instead of considering single pixel spectra, we take advantage of the proposed spatial-spectral processing so that the CNN sees the extracted CSS as inputs while producing the colony-based class scores as output. Our network topology, see Fig. 3 (right), contains 2 convolutional layers, 1 pooling layer, 1 fully-connected layer and a final probability-based (softmax) classifier layer, for a total of 1,905,496 network parameters to learn. The first convolutional layer evaluates 32 feature maps from the 125-dimensional CSS input, for each map a 5-tap filter is trained to produce same size output. The structure of the second convolution layer is similar and is composed of 64 feature maps (again 5-tap filters). Parametric Rectified Linear Units (PReLU) were used as activation functions [35]. After the two convolutional layers, a max pooling layer halves the size of feature maps given as input of a fully connected layer composed of 500 units, eventually followed by 9 output units. The selection of the above CNN structure was based on the evaluation of many possibilities by changing the number of convolutional and fully connected layers, learning rate and learning decay value (see Sect. 3). We implement the whole structure in Python 2.7 and TensorFlow 1.0 [36].

For comparison purposes we selected two conventional classification approaches among those which have been shown to be effective in handling reflectance spectral data in this and other HSI analysis contexts: SVM and Random Forests. Support Vector Machine (SVM) [37] is a popular non-parametric technique for binary classification. It is a suitable tool in cases of data not regularly distributed or data with an unknown distribution. We implement SVM according to multi-class one-against-all structure, with Radial Basis Function kernels configured through iterated model selection for each pathogen binary classifier.

Random Forests (RF) [38] is an ensemble learning method that operates by constructing a multitude of decision trees. They predict (through a bagging approach) deep insights into the structure of data. Each tree is built on different samples with randomness in the growing phase to ensure dissimilarity. Class with most votes (among all the trees in the forest) determines the prediction. The use of randomness and averaging improves the predictive accuracy and contrasts overfitting. We also tested both SVM and RF combined with information preserving dimensionality reduction obtained by Principal Component Analysis (PCA) [39], used to reduce spectral redundancy with 99.9% of retained variance.

3 Results and Discussion

We built and analyzed a database of 16642 colonies streaked and grown on Petri dishes (5% sheep blood agar plates, BBL, BD Diagnostics, Sparks, MD) from 106 HSI volumes acquired after 18 hours of incubation in \(O_{2}\). Target pathogens in our analysis, all belonging to the American Type Culture Collection (ATCC), and covering over 85% of UTI species of interest, are: E.coli (5539 colonies), E.faecalis (1958), S.aureus (2355), P.mirabilis (2315), P.vulgaris (654), K.pneumoniae (542), Ps.aeruginosa (1529) and Str.agalactiae (1750). Representative colony examples (from RGB images) and corresponding average spectral signatures are shown in Figs. 1 and  3 respectively. The whole dataset has been licensed for research use and can be accessed on http://www.microbia.org.

Classification Accuracy. Bacterial species classifiers based on CNN, as well as SVM and RF (with or w/o PCA) have been implemented and compared on the experimental dataset. In Table 1, classification performance in terms of average accuracy are reported.

Table 1. Classification accuracy (avg and std). With asterisk configurations considered in Fig. 4.
Fig. 4.
figure 4

Computational and memory footprint performance: training times–solid line, testing times–dashed line, and memory footprint–dotted line of the classifiers, versus the number of training samples

The selected CNN model, after 50,000 training iterations, reached an accuracy of 99.7% becoming our best option. A learning rate of 0.01 and learning decay of 0.005 were selected after many different tests, resulting in the following observations: (a) by growing the number of convolutional and/or FC layers we obtained minor improvements with more than double the training/testing time; (b) comparable classification results can be obtained with learning rates between 0.005 and 0.01, while using 0.05 leads to a lack of convergence for all tested configurations except a suboptimal one with single Conv and FC layers; (c) with dropout of 0.75 and momentum of 0.9 (commonly adopted values) we preserve both the network structure and highest accuracy levels with training and test timings fully acceptable to guarantee FLA compatible near real-time classification (see below).

Accuracy assessments are based on a 70/30 random split in training and validation sets for each class in the database and repeated five times following a Shuffle & Split cross-validation approach. For the CNN solution it is particularly significant to assess how the method behaves as the dimension of the training set decreases. We therefore considered different percentages of the training set and, in Fig. 5(a) we track accuracy performance as a function of the number of learning iterations. Several curves are used to show increased accuracy when increasing the training set dimension (we are able to reach accuracy already greater than 99% by using only 15% of the training set).

Though slightly inferior with respect to CNN, SVM also reached comparable performance, showing an accuracy peak of \(99.5\%\) without PCA, while we obtained \({>}1\%\) accuracy drop by adopting dimensionality reduction. It is therefore possible to create hyper-surfaces (thanks to RBF kernels) to accurately separate the analyzed classes.

RF was used differently to normal. When making predictions on the test dataset, we tried to exploit every tree in the forest in order to leverage the full forest and benefit from averaging the prediction. A decision is taken only if 70% of the forest agrees. This may decrease the overall accuracy but it increases classification precision, and reduces wrong predictions for test samples that bring new factors that were not in the training set (as not yet considered species or other undesirable alterations). Using this configuration, RF obtains its own best performance using PCA (\(97.1\%\), i.e. \(+3.3pp\) with respect to the baseline).

Fig. 5.
figure 5

(a) CNN dataset evaluation and (b) Confusion matrix with \(99.7\%\) accuracy.

Computational and Scalability Assessment. Fast CSS classification is needed especially in the context of FLA. Classifiers present strong discrepancies in terms of computational efficiency and scalability features according to the dataset dimension. This section analyses training time (Fig. 4 solid line), testing time (Fig. 4 dashed line) and classifier memory footprint (Fig. 4 dotted lined) versus the number of training samples. We used a standard PC for all classifiers (Intel Core i5-3470 CPU 4\(\,\times \,\)3.2GHz, 16 GB RAM) except CNN (Intel Core i7-5930K CPU 12\(\,\times \,\)3.5 GHz, 32 GB RAM, GeForce GTX TITAN X). SVM and CNN are the slowest proposed solutions by almost two order of magnitude as long as RF on training times. SVM has a bigger slope compared to CNN that maintains similar values throughout the dataset size (meaning a better scalability). RF generates many decision trees (1000 in configurations applied to our dataset) and in order to reduce the training time it is possible to prune some branches (or to limit the branching depth) in a possible trade-off with the accuracy. The cardinality of the dataset has little influence on the classification (testing) time. RF is the slowest (while CNN is the fastest) because any sample must flow along every decision tree. Classifier memory footprint rises for SVM and RF and it remains constant for CNN. In absolute, RF requires much more space than CNN and SVM. Though not the best in terms of accuracy, RF demonstrates a high level of precision (low false positives) and facilitates extrapolation of additional information. On the other hand, SVM shows great accuracy. CNN proved to be the best solution in terms of accuracy, memory footprint and testing time, as well as scalability with respect to the dataset dimension, while training time can be limited by using GPU and specific hardware.

Considerations and Future Directions. Figure. 5(b) shows the confusion matrix for the best CNN configuration. We observe only a few mutual misclassifications between Ent. faecalis and Str. agalact., Esch. coli and Kleb. pneum., couples of pathogens that produce colonies which are hardly distinguishable visually. They are also roughly spectrally similar (though in average separated by a bias term, see Fig. 5). Noticeably, very few misclassifications exist between Proteus vulg. and the not swarming Proteus mirab. (also almost impossible to discriminate visually) while, in a previous work [27], these classes were not distinctly separated, so they were considered as joined. Discrimination capability between two species of the same bacterial genus, as for Proteus, is of high application value and this is evidence of the improved CSS extraction introduced in this work.

According to accuracy of classification, complexity of the structure, memory footprint, training and testing times, the CNN-based method is seen as the best analyzed bacterial identification pipeline. However, near perfect species differentiation reveal the need and opportunities to further increase the number of considered pathogens as well as the size and variability of the dataset (e.g. including plates coming from clinical specimens). Even if, on same experimental setting, conventional classification methods reached high classification performance as well, we can expect that DL-based approaches will be more appropriate in presence of scalability needs and variability factors that will be considered in order to bring this HSI technology closer to clinical application.

4 Conclusion

We verified the possibility of applying a deep learning approach to UTI bacteria identification by using HSI technology operating in the VNIR spectrum. Our CNN-based solution obtained highest classification accuracies on a large laboratory dataset, notwithstanding the significant number of analyzed pathogens and the fact that pathogen spectral signature differentiation is challenging and made even harder by spectral mixing with the growing media. There are also notable differences in term of scalability (both training, testing and memory used) driving our CNN implementation selection above alternate methods. Improvements over previous works have also been obtained thanks to a better data acquisition setup and a more reliable CSS assessment. This study suggests that further investigations are desirable by making our deep learning pipeline functional in a real clinical lab environment. Future activities should take into account an even higher number of UTI-relevant pathogens and clinical laboratory validations.