Keywords

Introduction

An extensive amount of effort has been put into the development of different sensor solutions to detect, monitor, and identify airborne biological agents. The variety of methods behind the several sensor solutions cannot go unnoticed, but no standard and interoperable EU-wide approach is available to set the threshold for monitoring biothreat either outdoors or within critical infrastructure. Several research and development studies suggest bacteria are capable of surviving the aerosol transport and they can travel much longer distances between the hosts than we previously thought [2, 5, 8,9,10]. The sporulated form is not the only option for bacteria for airborne dissemination [6], which highlights the importance of using classical optical methods powered by AI-supported solutions for rapid monitoring and detection.

A recent study carried out by the Pentagon revealed several economic barriers of genomic analysis methods to be adopted in daily routine in terms of bioaerosol monitoring and biothreat detection [7] which shows that we are unprepared for real time, or at least frequent monitoring of airborne biocontent.

Regarding the currently available solutions for pathogen detection, there is a trade-off between time and accuracy. While the gold standards for genus and strain level identification of bacteria are still the different genomic methods, the classical optical methods like different forms of quantitative phase imaging microscopy powered by AI-supported solutions offer the possibility of rapid and automated detection of suspicious pathogens either in water or air-based samples.

One of the reasons why there is no existing standard, interoperable, and real-time or quasi-real-time, optical sensor-based biothreat monitoring solution is the lack of platforms capable of comparative verification, monitoring, and data archiving for traceable intermethod comparison and cross-validation.

There are several other reasons behind the difficulties of standardization and interoperability of optical sensor-based rapid biothreat monitoring and detection. We do not intend to cover the below-mentioned list in this short publication, but it is important to mention some of these factors to highlight the complexity of the problem:

  • The variety of air sampling methods and devices and the lack of knowledge regarding the limit of detection between the several air sampling solutions make it even more difficult to establish protocols for bioaerosol monitoring [1, 3]. The current solution is to focus on use case-specific applications and to comply with the generally well-established, referring NATO (STANAG) and ISO standards regarding the overall sampled volume of the air to avoid statistical down-sampling.

  • The lack of widely accepted air quality standards [3] makes it very difficult to determine the exact baseline of the “background noise” within the air [4, 8, 11]—referring to the nonbiological particle components—in order to optimize and finetune the AI-supported solutions for pathogen detection.

  • The currently accepted standard pathogen detection and identification protocols are based on culturing following the sample collection which significantly elongates the lengths of the process [3] and also increases the methodological diversity between the existing protocols.

Even if some AI-supported optical sensor-based platform would exist, there are no standard testing and validation protocols to evaluate the application-specific performance metrics. This is the topic we are focusing on in the framework of this publication. We introduce the recent results regarding the performance of the “DataSenseLabs AI-supported biothreat detection platform” using two different approaches: field data (air sample) collection and computer simulation-based testing related to our cross-validation-based development strategy.

Since the disease control authorities are highly vigilant regarding the environmental presence of the most dangerous and unfortunately well-known member of the Bacillus cereus group [4, 12, 13], and it is very easy to access all the necessary components to create a virulent Bacillus anthracis strain, our AI-supported biothreat detection platform is currently being finetuned for the detection of bacillus form objects sampled from the air.

Methods and Results

Regarding the process flow of biothreat detection, “DataSenseLabs AI-supported biothreat detection platform” is currently proceeded by three separate components of processes sequentially following each other: (1) air sample collection; (2) sample preparation; (3) light microscopic measurement; and the final step is the AI-supported pathogen detection. In the current phase of the R&D, the optical microscopic measurements were carried out by a digital holographic microscope (DHM-HoloZcan-EPro2) provided by the EU Horizon-supported HoloZcan project, and by a reference differential interference contrast (DIC) light microscope (NIKON Eclipse Ti2).

The performance of the DataSenseLabs AI-platform solution was finetuned and tested on three different sample types: laboratory-made mixture of different bacterial strains including Bacillus subtilis (ATCC 6533); environmental field samples collected by the Coriolis Compact air sampling device manufactured by Bertin Technologies; and computer-simulated holographic particle populations in the diameter (0.5 < d < 2 μm) and refractive index (R = 1.4) range compatible with the optical properties of bacteria. In the case of field samples, the air samples were dissolved in 1 ml physiological saline on site and were placed on standard microscopic slides for the DHM and DIC measurements.

Data analysis pipeline developed in MATLAB 2022b and Python consists of the following steps:

  1. 1.

    Use case-specific database building from the captured microscopic image datasets. Currently, the DHM and DIC image data input is supported by the “DataSenseLabs AI-supported biothreat detection platform.”

  2. 2.

    Extension of the databases with the metadata of annotated region of interests (ROIs). For supervised deep learning algorithm development, the ROIs of bacterial cells—as the reference or “ground Truth”—were labeled by rectangular bounding boxes in the MATLAB Image Labeler app.

  3. 3.

    Asking the right questions: dividing the database into context-specific and use case-specific training and test sets.

  4. 4.

    Optimization of the convolutional neural network (CNN) model parameters by training and data augmentation.

    1. (a)

      Optimizing the training process by finding the optimal number of training iterations based on the mini-batch accuracy and loss function values.

    2. (b)

      Image minibatches were created by dividing the original raw images into several smaller images with the pixel dimensions of 300 × 340. The training process was conducted by applying overlapping and nonoverlapping minibatches as well. During the training process, the saturation and exposure of the input images were modified within a 10% range, to diversify the training set.

  5. 5.

    Testing the trained model, comparing the reference (ground Truth) and predicted values and metrics.

  6. 6.

    Concluding the results and restart the process either from point 3 or 2 or even from point 1.

To support the original interoperability and cross-validation concept of the “DataSenseLabs AI-supported biothreat detection platform,” the possibility of multidata input has been developed to compare the results of the AI-supported detection in the case of several light microscopic image databases. In this publication, we show the summary statistics only of the DHM and DIC image datasets.

The two figures below (Fig. 10.1a and b) explain the concept of the cross-reference database-building strategy for the AI-supported platform development:

Fig. 10.1
A. 2 microscopic images of cross-validation database building for gram-stained and non-stained Bacillus simulant samples from air. B. A block diagram illustrates data collection for A I interface testing, development, and algorithm, complex image database, image annotation, data generation, and D H M and L M devices.figure 1

(a) The purpose of the cross-reference database-building approach combines the application of image databases captured from laboratory-made (InLab: on the left) and from field-collected (OnField: on the right) samples. The optimal structure of workflow iteration during the R&D process that is either within the AI development or within the calibration-testing-validation process and between the two different, but interconnected processes is indicated by blue lines. (b) The figure represents a gram-stained DHM image (on the left) and a nonstained DIC image (on the right) of the same “Bacillus simulant” strain that was used during the “field data collection, testing, training and demonstration” events. “DataSenseLabs AI-supported biothreat detection platform” can receive both types of light microscopic images to estimate the number of suspicious bacillus form objects within the air sample

As referred to in Fig. 10.1a, we followed a custom-designed database-building strategy during the research and development process of the “DataSenseLabs AI-supported biothreat detection platform.” We have built seasonal, and geolocation-specific multiimage databases captured from the air samples to be able to estimate the overall diameter distribution of the naturally occurring biological and nonbiological particle components (“background noise”), since these can be crucial carriers of long-distance bacterial dissemination [4, 6, 8, 10, 11]. Three different approaches have been worked out to develop the databases for AI algorithm development and testing:

  • Calibration-oriented databases including the diameter values (in μm) of the seasonally occurring particles in the air are sampled by a reference air sampling device, the Coriolis Compact. The samples were taken at the same geolocations (forest region, busy road, indoors) in three different seasons (Summer, Autumn, and Spring).

  • Digitally mixed databases containing the microscopic images of the environmental (background noise) samples and the samples created under laboratory conditions including specific bacterial strains.

  • Directly mixed databases where the environmental fluid samples were directly mixed with the suspensions of specific, known bacterial strains.

The air samples were analyzed by DIC light microscopy to create the reference diameter value databases to verify the presence of the Bacillus simulant applied during the field data collection (air sample collection) and testing events as referenced by the figure below (Fig. 10.2).

Fig. 10.2
Two histograms and two boxplots compare particle diameters from two air samples. The top plots depict a sample with a mean diameter of 0.91 micro meter, and the bottom plots depict a sample with a mean diameter of 1.22 micro meter. Both histograms display the distribution.

The histogram (on the left), the boxplot (in the middle) of the measured particle diameters, and the basic statistical features of the particle diameters (on the right) are shown. The results on the top demonstrate the diameter distributions of the air samples containing the spontaneously occurring particles at the geolocation “busy road.” The results in the bottom demonstrate the diameter distributions of the air samples containing the spontaneously occurring particles + the artificially administered bacillus simulant particles. The distributions are significantly different from each other (non-parametric statistical tests: p < 0.01)

Following the calibration-level verification of the presence of the artificially administered Bacillus simulants, the “DataSenseLabs AI-supported biothreat detection platform” was used to detect the presence of suspicious bacillus form objects within the air sample. The performance metrics are summarized below in Table 10.1a and Table 10.1b:

Table 10.1 The tables show the results of the Deep-Learning network model-based predictions regarding the presence of suspicious bacillus form objects in the field collected air sample based on the DIC light microscopic measurement (on the left Table 10.1a) and based on the DHM light microscopic measurement (on the right Table 10.1b)

Besides the field data collection and testing-based AI algorithm development approach, it is extremely important to test and validate the theoretical limits of performance of an AI-supported algorithm. Simulated holographic particle databases have been created by the Mie method applying the Python-based HoloPy tool [14] to evaluate the performance of the “DataSenseLabs AI-supported biodetection platform” under use case-specific conditions as summarized in Fig. 10.3 and Table 10.2 below.

Fig. 10.3
Two microscopic images against a dark background. The left depicts the original simulation, the right displays annotations for reference and bounding boxes for A I detected bacterial objects and spores.

The figure shows that the “DataSenseLabs AI-supported biodetection platform” is capable of distinguishing even between spatially extremely close particles in the diameter range of bacterial objects and their spores (0.5–2 μm). On the left: the original simulated image. On the right: the simulated image indicating the reference annotation by yellow, and the predicted AI-based detection by blue bounding boxes

Table 10.2 The table shows the influence of the particle density within the sample on the performance metrics of the “DataSenseLabs AI-supported biodetection platform”

Conclusion and Further Steps

The “DataSenseLabs AI-supported biodetection platform” supports the quantitative phase imaging light microscopic sensor-based data input (DHM, DIC) and also two different computer-simulated image data inputs for algorithm development, training, testing, and evaluation. The platform’s algorithm system can detect and monitor the anomalies in the concentration of bacillus form objects sampled from the air with higher than 80–95% accuracy depending on the study design, sample type, and the light microscopic measurement method.

Evaluation of an AI-supported computer vision platform needs standardized, classic image databases as it is also suggested in the ITU-T F.748.12 (Table 7–3, page 4–5) standard [15], but we also suggest the implementation of a custom-designed, cross-validation-based three step (laboratory-made, field-collected, and simulated data-based) evaluation strategy as we introduced in the case of our AI platform development and evaluation. The simulated databases will be shared in a separate publication according to the “FAIR” data policy directives [16].

Based on the current results, the platform is capable of supporting CBRN and biothreat surveillance-related decision-making and prestandardization processes to establish solid foundations for interoperability in the field of optical sensor-based and light microscopic measurement-based instant biothreat detection. Following the analysis of the final results in the first part of 2024, we will be able to choose to most appropriate and use case-specific sensor type for miniaturization and industrial-level production.