Deep Neural Networks Approach to Microbial Colony Detection—A Comparative Analysis

Majchrowska, Sylwia; Pawłowski, Jarosław; Czerep, Natalia; Górecki, Aleksander; Kuciński, Jakub; Golan, Tomasz

doi:10.1007/978-3-031-11432-8_9

Sylwia Majchrowska^15,16,
Jarosław Pawłowski^15,16,
Natalia Czerep^15,17,
Aleksander Górecki^15,16,
Jakub Kuciński^15,17 &
…
Tomasz Golan¹⁵

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 440))

Included in the following conference series:

Conference on Multimedia, Interaction, Design and Innovation

5088 Accesses
2 Citations

Abstract

Counting microbial colonies is a fundamental task in microbiology and has many applications in numerous industry branches. Despite this, current studies towards automatic microbial counting using artificial intelligence are hardly comparable due to the lack of unified methodology and the availability of large datasets. The recently introduced AGAR dataset is the answer to the second need, but the research carried out is still not exhaustive. To tackle this problem, we compared the performance of three well-known deep learning approaches for object detection on the AGAR dataset, namely two-stage, one-stage, and transformer-based neural networks. The achieved results may serve as a benchmark for future experiments.

You have full access to this open access chapter, Download conference paper PDF

Annotated dataset for deep-learning-based bacterial colony detection

Article Open access 28 July 2023

Deep convolutional neural network: a novel approach for the detection of Aspergillus fungi via stereomicroscopy

Article 29 March 2021

A comprehensive review of image analysis methods for microorganism counting: from classical image processing to deep learning approaches

Article 29 September 2021

Keywords

1 Introduction

The ability to automatically and accurately detect, localize, and classify bacterial and fungal colonies grown on solid agar is of wide interest in microbiology, biochemistry, food industry, or medicine. An accurate and fast procedure for determining the number and type of microbial colonies grown on a Petri dish is crucial for economic reasons - industrial testing often relies on proper determination of colony forming units (CFUs). Conventionally, the analysis of samples is performed by trained professionals, even though it is a time-consuming and error-prone process. To avoid these issues, an automated methodology based on artificial intelligence can be applied.

A common way of counting and classifying objects using deep learning (DL) is to first detect them and then count the found instances distinguishing between different classes. We compared the results of microbe colony counting using selected detectors belonging to different classes of neural network architectures, namely two-stage, one-stage, and transformer-based models.

Our experiments were conducted on the Annotated Germs for Automated Recognition (AGAR) dataset with higher-resolution subset [10], which consists of around 7k annotated Petri dish photos with five microbial classes. This paper focuses on setting benchmarks for the detection and counting tasks using state-of-the-art (SoTA) models.

2 Object Detection

Object detection was approached using two major types of DL-based architectures, namely two-stage and one-stage models. Two-stage detectors find class-agnostic object proposals in the first stage, and in the second stage the proposed regions are assigned to the most likely class. They are characterised by high localization and classification accuracy. The largest group of two-stage models are Region Based Convolutional Neural Networks (R-CNN) [6], whose main idea is based on extracting region proposals from the image. Over the years, the networks from this family have undergone many modifications. In the case of Faster R-CNN [15] architecture, a Region Proposal Network (RPN) was used instead of the Selective Search algorithm. This allows for significant reduction of the model’s inference time. In order to reduce issues with over-fitting during training, Cascade R-CNN [2] was introduced as multi-stage object detector, which consists of multiple connected detectors that are trained with increased intersection over union (IoU) thresholds. A year later, authors of Libra R-CNN [11] focused on balancing the training process by IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss. In the following years, researchers used Faster R-CNN while replacing its backbone (CNN used as a feature extractor) with newer architectures. The most recent concept is Composite Backbone Network V2 [8] (CBNetV2), which groups multiple pretrained backbones of the same kind for more efficient training.

On the other hand, the one-stage architectures are designed to directly predict classes and bounding box locations. For this reason, one-stage detectors are faster, but usually have relatively worse performance. Single-stage detection was popularized in DL mainly by You Only Look Once models (YOLO v1 [12], 9000 [13], v3 [14], v4 [1]), primarily developed by Joseph Redmon. Recently, the authors of YOLOv4 [1] have enhanced the performance of YOLOv3 architecture using methods such as data augmentation, self-adversarial training, and class label smoothing, all of which improve detection results without degrading the inference speed. Moreover, the authors of EfficientDet [16] introduce changes which contribute to an increase in both accuracy and time performance of object detection. The main proposed changes include using weighted Bidirectional Feature Pyramid Network (BiFPN), compound scaling, and replacing the backbone network with EfficientNet [17]. EfficientNet as a backbone connects with the idea of scalability from compound scaling, which allows the model to be scaled to different sizes and to create a family of object detectors for specific uses.

Additionally, the transformers have recently become a next generation of neural networks for all computer vision applications, including object detection [3, 5, 19]. The interest in replacing CNN with transformers is mainly due to their efficient memory usage and excellent scalability to very large capacity networks and huge datasets. The parallelization of transformer processes is achieved by using an attention mechanism applied to images split into patches treated as tokens. The utilization of the transformer architecture to generate predictions of objects and their position in an image was first proposed in DEtection TRansformer (DETR) network [3]. The architecture uses a CNN backbone to learn feature maps, then feeds transformer layers. In comparison to DETR, Deformable DETR [19] network replaces self-attention in the encoder and cross-attention in the decoder with multi-scale deformable attention and cross-attention. Deformable attention modules only attend to a small set of key sampling points around a reference point which highly speeds up the training process. The recently introduced Cross-Covariance Image Transformers (XCiT) [5] concept is a new family of transformer models for image processing. The idea is to use a transformer-based neural network as a backbone for two-stage object detection networks. XCiT splits images into fixed size patches and reduces them into tokens with a greater number of features with the use of a few convolutional layers with Gaussian Error Linear Units (GELU) [7] in between. The idea behind the model is to replace self-attention with transposed attention (which is over feature maps instead of tokens).

3 AGAR Dataset

The AGAR dataset [10] contains images of microbial colonies on Petri dishes taken in two different environments, which produced higher resolution and lower resolution images. The differences are between the lighting conditions and apparatuses. Higher resolution images, which were used in our studies, can be divided into bright, dark and vague subgroups. On the other hand, considering the number of colonies, samples can be defined as empty, countable and uncountable. The dataset includes five classes, namely E.coli, C.albicans, P.aeruginosa, S.aureus, B.subtilis, while annotations are stored in json format with information about the number and type of microbe, environment and coordinates of bounding boxes.

In this paper, we present the results of experiments performed using a subset of the AGAR dataset, which consists of 6990 images in total. In our case only higher resolution (mainly \(4000\times 4000\) px), dark and bright, without vague, samples with countable number of colonies were chosen. Firstly, images were split into train and validation subsets (the same for each experiment), and then divided into \(512\times 512\) px patches as described in [10]. At the end—in the test stage—whole images from the validation subset of the Petri dish were used (for a detailed description of the procedure see Supplementary materials from [10]).

4 Benchmarking Methodology

We compared the performance of the selected models using several metrics: architecture type and size, inference time, and detection and counting accuracy.

During time measurements, the inference was executed on GeForce GTX 1080 Ti GPU using the same patch with 6 ground truth instances. The models were first loaded into memory, then inferred 100 times sequentially (ignoring the first 20 times for warming up) to calculate averaged time and its standard deviation for each model separately.

As to detection results, the detector performance was evaluated twofold - by measuring the effectiveness of detection and counting. As an evaluation metric for colony detection, we rely on the mean Average Precision (mAP), to be precise mAP@.5:.95, averaged over all 5 classes. The efficiency of colony counting was measured based on Mean Absolute Error (MAE), and Symmetric Mean Absolute Percentage Error (sMAPE).

With the growing popularity of DL, many open source software libraries implementing SoTA object detection algorithms emerge. Results provided for Faster R-CNN and Cascade R-CNN were taken from [10] for comparison purposes. Similarly, in our experiments we relied on MMDetection [4] framework (Libra R-CNN, CBNetV2, Deformable DETR, XCiT), Alexey Bochkovskiy’s Darknet-based implementation of YOLOv4 [1], and Ross Wightman’s PyTorch [18] reimplementation of official EfficientDet’s TensorFlow implementation. To perform model training, we used the default parameters as for COCO dataset in the above-mentioned implementations. In all experiments, we used transfer learning technique based on pretrained backbones (ResNet-50, Double ResNet-50, XciT-T12) on ImageNet or whole architecture (YOLOv4, EfficientDet-D2, Deformable DETR) on MS COCO dataset. In the case of YOLOv4, we changed the input size to \(512\times 512\) px in order to match the size of the generated patches. We used pretrained backbones in all experiments. Traditional two- and one-stage networks were trained with Stochastic Gradient Descent (SGD) optimizer, as opposed to Transformer based architectures, where AdamW [9] optimizer was used. The values of initial learning rate vary between \(10^{-3}\) and \(10^{-5}\) for each model. All networks were trained until loss values saturated for the validation subset. We also chose commonly used augmentation strategies of selected models, like flips, crops, and resizes of images.

4.1 Results

Mean averaged precisions presented in Table 1 are averaged over all microbe classes. Calculated value of mAP@.5:.95 varies between 0.491 and 0.529. The most efficient results in terms of accuracy and inference speed were achieved for YOLOv4 architecture. On the other hand, transformer-based architectures present slightly worse performance. Some interesting cases were presented in Fig. 1. The selected image presents the same microbial species (P.aeruginosa), which forms two different sizes of colonies due to agar inhomogeneity, making detection even more challenging. Labeled small contamination is not perceived by all models (transformer based and EfficientDet-D2), and some of them (YOLOv4, Deformable DETR) also have problems with precise localization of blurred colonies. Two-stage detectors have a tendency to produce some excessive predictions.

Table 1. Benchmarks for tested models on the higher-resolution subset of AGAR dataset. The model size is given in terms of number of parameters (in millions). In case of XCiT model number of backbone’s parameters is given in brackets.

Full size table

The performance of the selected architectures for microbial counting is presented in both Table 2 and Fig. 2, while Table 3 shows all five microbial species separately. In general, all detectors perform better for microbes that form clearly visible, separate colonies. The biggest problem with locating individual colonies was observed for P.aeruginosa, where the tendency for aggregation and overlapping is the greatest. Overall, the best results were obtained for the YOLOv4 model, where the predicted count of microbial colonies is the closest to ground truth in range from 1 to 50 instances (see Fig. 2) – the most operable scope for industrial applications. The straight black lines represent a 10% error to highlight the acceptable error range indicated by microbiologists. The worst performance was observed for the EfficientDet-D2 model – where small instances of microbial colonies were omitted (not localized at all), which may be caused by resizing the patches to fit the input layer size. Very low contrast between the agar substrate and the colony (bright subset of AGAR dataset) is an additional problem here.

Table 2. Symmetric Mean Absolute Percentage Error (sMAPE) and Mean Absolute Error (MAE) obtained for different models.

Full size table

Table 3. Symmetric Mean Absolute Percentage Error (sMAPE) obtained for different types of microbes.

Full size table

5 Conclusions

In the conducted studies, we analyzed eight SoTA deep architectures in terms of model type, size, average inference time, and the accuracy of detecting and counting microbial colonies from images of Petri dishes. A detailed comparison was performed on AGAR dataset [10].

The presented results do not differ much between the different types of architectures. It is worth noting that we chose rather smaller, typical backbones for the purposes of this comparison to create a baseline benchmark for different types of detectors. It appeared that the most accurate (\(\mathrm {mAP}=0.529\)) and the fastest model (17 ms) is one-stage YOLOv4 network, making this model an excellent choice for industrial applications. Two-stage architectures of different types and kinds achieved moderate performance, while transformer-based architectures gave the worst results. EfficientDet-D2 turned out to be the smallest model in terms of the number of parameters.

Our experiments yet again confirm the great ability of DL-based approaches to detect microbial colonies grown in Petri dishes from RGB images. The biggest challenge here is the need to collect large amounts of balanced data. To train detectors in a fully-supervised manner, data must be properly labelled. However, identification of abnormal colonies grown in a Petri dish can be difficult even for a trained specialist - overlooked microbes were usually small and located at the edge of a Petri dish. Usually 10% error is an acceptable range for manual labeling. It is worth to point out that AI-assisted models can make predictions even for overcrowded dishes, which is a big advantage of this type of technology. They are not prone to fatigue like a human, however, variable lighting conditions can make detection even more difficult, which can be observed in our case for EfficientDet-D2 prediction for unrepresented bright samples.

References

Bochkovskiy, A., et al.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Cai, Z., et al.: Cascade R-CNN: delving into high quality object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018). https://doi.org/10.1109/CVPR.2018.00644
Carion, N., et al.: End-to-end object detection with transformers. arXiv preprint arXiv:2005.12872 (2020)
Chen, K., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
El-Nouby, A., et al.: XCiT: cross-covariance image transformers. arXiv preprint arXiv:2106.09681 (2021)
Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311.2524 (2014)
Hendrycks, D., et al.: Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2020)
Liang, T., et al.: Cbnetv2: a composite backbone network architecture for object detection. arXiv preprint arXiv:2107.00420 (2021)
Loshchilov, I., et al.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2019)
Majchrowska, S., et al.: Agar a microbial colony dataset for deep learning detection. arXiv preprint arXiv:2108.01234 (2021)
Pang, J., et al.: Libra R-CNN: towards balanced learning for object detection. arXiv preprint arXiv:1904.02701 (2019)
Redmon, J., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Redmon, J., et al.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J., et al.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2015, pp. 91–99. MIT Press, Cambridge (2015)
Google Scholar
Tan, M., et al.: Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10778–10787 (2020). https://doi.org/10.1109/CVPR42600.2020.01079
Tan, M., et al.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., et al. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (2019)
Google Scholar
Wightman, R.: Efficientdet (pytorch) (2020). https://github.com/rwightman/efficientdet-pytorch
Zhu, X., et al.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Download references

Acknowledgements

Project “Development of a new method for detection and identifying bacterial colonies using artificial neural networks and machine learning algorithms” is co-financed from European Union funds under the European Regional Development Funds as part of the Smart Growth Operational Program. Project implemented as part of the National Centre for Research and Development: Fast Track (grant no. POIR.01.01.01-00-0040/18).

Author information

Authors and Affiliations

NeuroSYS, Rybacka 7, 53-656, Wrocław, Poland
Sylwia Majchrowska, Jarosław Pawłowski, Natalia Czerep, Aleksander Górecki, Jakub Kuciński & Tomasz Golan
Wroclaw University of Science and Technology, Wybrzeże S. Wyspiańskiego 27, 50-372, Wroclaw, Poland
Sylwia Majchrowska, Jarosław Pawłowski & Aleksander Górecki
University of Wroclaw, Fryderyka Joliot-Curie 15, 50-383, Wroclaw, Poland
Natalia Czerep & Jakub Kuciński

Authors

Sylwia Majchrowska
View author publications
You can also search for this author in PubMed Google Scholar
Jarosław Pawłowski
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Czerep
View author publications
You can also search for this author in PubMed Google Scholar
Aleksander Górecki
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Kuciński
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Golan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sylwia Majchrowska .

Editor information

Editors and Affiliations

National Research Institute, National Information Processing Institute, Warsaw, Poland
Cezary Biele
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Janusz Kacprzyk
Polish-Japanese Academy of Information Technology, Warsaw, Poland
Wiesław Kopeć
Systems Research Institute, Polish Academy of Science, Warsaw, Poland
Jan W. Owsiński
Institute of Applied Computer Science, Faculty of Electrical, Electronic, Computer and Control Engineering, Łódż University of Technology, Łódź, Poland
Andrzej Romanowski
Department of Applied Informatics in Management, Faculty of Management and Economics, Gdańsk University of Technology, Gdańsk, Poland
Marcin Sikorski

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Majchrowska, S., Pawłowski, J., Czerep, N., Górecki, A., Kuciński, J., Golan, T. (2022). Deep Neural Networks Approach to Microbial Colony Detection—A Comparative Analysis. In: Biele, C., Kacprzyk, J., Kopeć, W., Owsiński, J.W., Romanowski, A., Sikorski, M. (eds) Digital Interaction and Machine Intelligence. MIDI 2021. Lecture Notes in Networks and Systems, vol 440. Springer, Cham. https://doi.org/10.1007/978-3-031-11432-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-11432-8_9
Published: 27 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11431-1
Online ISBN: 978-3-031-11432-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Deep Neural Networks Approach to Microbial Colony Detection—A Comparative Analysis

Abstract

Similar content being viewed by others

Annotated dataset for deep-learning-based bacterial colony detection

Deep convolutional neural network: a novel approach for the detection of Aspergillus fungi via stereomicroscopy

A comprehensive review of image analysis methods for microorganism counting: from classical image processing to deep learning approaches

Keywords

1 Introduction

2 Object Detection

3 AGAR Dataset