Performances of automated digital imaging of Gram-stained slides with on-screen reading against manual microscopy

The objective of this study was to evaluate the performances of the automated digital imaging of Gram-stained slides against manual microscopy. Four hundred forty-three identified Gram-stained slides were included in this study. When both methods agreed, we considered the results as correct, and no further examination was carried out. Whenever the methods gave discrepant results, we reviewed the digital images and the glass slides by manual microscopy to avoid incorrectly read smears. The final result was a consensus of multiple independent reader interpretations. Among the 443 slides analyzed in this study, 101 (22.8%) showed discrepant results between the compared methods. The rates of discrepant results according to the specimen types were 5.7% (9/157) for positive blood cultures, 42% (60/142) for respiratory tract specimens, and 22% (32/144) for sterile site specimens. After a subsequent review of the discrepant slides, the final rate of discrepancies dropped to 7.0% (31/443). The overall agreement between the compared methods and the culture results reached 78% (345/443) and 79% (349/443) for manual microscopy and automated digital imaging, respectively. According to culture results, the specificity for automated digital imaging and manual microscopy were 90.8% and 87.7% respectively. In contrast, sensitivity was 84.1% for the two compared methods. The discrepant results were mostly encountered with microorganism morphologies of rare occurrence. The results reported in this study emphasize that on-screen reading is challenging, since the recognition of morphologies on-screen can appear different as compared to routine manual microscopy. Monitoring of Gram stain errors, which is facilitated by automated digital imaging, remains crucial for the quality control of reported Gram stain results.


Introduction
By any measure, this decade has been outstanding in the history of automation in clinical microbiology. Automation enabled not only to customize each analytical step but also to force the laboratory managers to concentrate all the pre-analytical steps onto a unique physical interface that has become the entry door to all further analytical activities of conventional bacteriology. Two automated systems are currently available for clinical specimens streaking and slides preparation: Inoculated media are loaded onto conveyors for transfer to automated incubators where cultures are imaged with highresolution digital pictures at pre-defined times. Direct consequences of total laboratory automation can be measured as improved productivity, traceability, quality, and reduced turn-around times [1][2][3][4][5][6]. But despite the implementation of these novel technologies, some traditional techniques (e.g., Gram stain) continue to bear an important role in the diagnostic process. The Gram-stained smears remain important as a pre-analytical indicator of respiratory tract specimen quality (e.g., sputum), for presumptive etiologic diagnosis, to guide empirical therapy and to indicate the presence of mixed aerobic and anaerobic infections. The Gram stain has therefore been a cornerstone for clinical bacteriology laboratories for over a century, despite the subjectivity of the results interpretation (highly operator-dependent) and the manual nature of the staining process. The interpretation of Gram stain results continues to be labor-intensive, time-consuming, and strongly dependent on the quality of the samples. To face the increasing workload in clinical microbiology laboratories, automated slide scanning and imaging might provide several advantages and adequately complement manual testing. Nonetheless, many technical challenges should be overcome before Gram-stain automation can be systematically deployed in bacteriology. For instance, the quality of the staining remains strongly affected by the smear preparation (markedly different for a thick sputum or a biopsy versus for a blood culture or a body fluid). Therefore, Gram-stained smears can display tremendously variable and heterogeneous background staining, which can obviously affect the algorithm that may target areas without bacteria and miss the most relevant microscopic zones.
The overarching objective of this study was to assess the performances of automated digital imaging of Gram-stained slides with on-screen reading against manual microscopy. Automated digital imaging was performed by the Metafer slide scanning platform that permits scanning, digitalization, and archiving of slides automatically, even in a batch mode.

Slide collection and workup
A total of 443 identified Gram-stained slides from positive blood cultures (n = 157), respiratory tract specimens (n = 142), and sterile site specimens (n = 144) were collected in the clinical bacteriology laboratory of Geneva University Hospitals between February and June 2020. All the slides included in this study were prepared by the Copan WASP® during the routine clinical workup. Importantly, the slides were chosen without any preselection (e.g., staining quality, abundance, or identity of the microorganisms), and they were not pre-screened by automated digital imaging in order to ideally capture the variability of routine specimens. One hundred additional slides, encompassing all the specimen types analyzed in this study, were used during the training period to validate and evaluate the different parameters of the Metafer slide scanning and the imaging platform, according to the manufacturer's instructions (MetaSystems Hard & Software GmbH, Altlussheim, Germany). These slides were not included in the subsequent study period. Slides from respiratory tract and sterile site specimens were stained using a manual method. In contrast, all slides from positive blood cultures were stained using the PREVI® Color Gram (BioMérieux, Marcy L'Etoile, France). The slides analysis for each workflow was performed by four experienced laboratory technologists and two clinical microbiologists by rotating after a training period, in order to avoid any learning bias. Importantly, all were blinded from the results obtained using the other method.

Culture diagnostic workup
All specimens included in this study were processed on the WASPLab following the protocols previously published [1][2][3].

Discrepant results
The results of automated digital imaging with on-screen reading were compared to the manual microscopy. When both methods agreed, we considered the results as correct, and no further examination was carried out. Whenever the methods gave discrepant results (i.e., negative smear or one or more morphologies was/were not reported), we reviewed the digital images and the glass slides by manual microscopy to avoid incorrectly read smears. For the remaining discrepant slides, the Gram strain results were assessed against culture results (Fig. 1). The final result was a consensus of multiple independent reader interpretations.

Metafer slide scanning and imaging platform
In this study, we used a commercial off-the-shelf software. All the 443 slides included in this study plus the 100 slides used during the training period were imaged without coverslips  Fig. 1 Algorithm of results assessment between automated digital imaging with on-screen reading and manual microscopy using a Metafer slide scanning and imaging platform with a 160-slide-capacity automated slide loader equipped with × 10 and × 40 magnification objectives (Carl Zeiss AG, Oberkochen, Germany) and automatic random access of slides. The × 10 magnification was used for collecting images spanning the whole slide. Collected images were then stitched together to create a single digital picture of the Gram smear. Right after, 20 images were taken with seven focal planes for each such picture using a × 40 oil immersion lens magnification for defined areas, according to the manufacturer's instructions. On-screen reading of such digital images was performed using the Metafer 5 software.
According to culture results, 6.5% (2/31) and 9.7% (3/31) of the remaining discrepant results were smear negative and culture positive for manual microscopy and automated digital imaging with on-screen reading, respectively. Additionally, for 32.3% (10/31) of the remaining discrepant results, only oropharyngeal flora was observed on culture media. The assessment of the 31 discrepant reviewed slides against culture highlighted that 42% (13/31) and 38.7% (12/31) were discordant with the culture results for manual microscopy and automated digital imaging with on-screen reading, respectively ( Table 2). The performances of the two compared methods according to culture results are depicted in Table 3.

Discussion
The Gram stain belongs to those tests routinely performed in clinical microbiology laboratories that are prone to variability and to the subjectivity of their interpretation [7,8]. In many cases, Gram stain errors can have a significant clinical impact, especially for sterile site specimens and blood cultures, underlining why clinical microbiology laboratories perform frequent quality controls to monitor the correlation between Gram stain results and cultures. Several reports have examined Gram stain errors rates and highlighted the major drivers of such errors [9][10][11][12]. Moreover, the interpretation of Gram stain results remains labor-intensive, time-consuming, highly subjective, and strongly dependent on the specimen types and on the smear quality. Nowadays, the steadily increasing workload for clinical analyses challenges clinical microbiology laboratories, facing the divergent needs to improve quality, productivity, and turn-around times while simultaneously rationalizing the laboratory technologists' workforce. Some of these challenges can therefore further impact the Gram stain errors rates by precluding a daily review of smears that showed discrepant results with cultures, and decrease the ongoing quality control. To decrease the rates of discrepant results for sterile fluid samples, an additional slide is systematically performed and stained with acridine orange in our laboratory, even if the sensitivity of Gram and acridine orange-   staining remains suboptimal compared to culture in the rapid diagnosis of septic arthritis [13].
Using automated digital imaging with on-screen reading, we assessed the overall slide classification accuracy on the 443 Gram-stained smears which were previously classified by manually microscopy. The overall agreement between both methods was 77%. However, after a subsequent review of the discrepant slides, the overall agreement reached 93%. The rate of discrepant results was markedly different between the three specimen types included in this study. Specifically, the agreements between the compared methods according to the specimen types were 99.4%, 85.2%, and 93.8% for positive blood cultures, respiratory tract specimens, and sterile site specimens, respectively. Despite the fact that not all observed bacteria in a specimen may be recovered in culture due to either a lack of viability or overgrowth by a more predominant organism(s), the overall agreement between the compared methods and the culture results reached 78% (345/443) and 79% (349/ 443) for manual microscopy and automated digital imaging using on-screen reading, respectively.
In the context of a multicenter evaluation of Gram stain error study, Samuel et al. reported that 24% of discrepant Gram strain results were linked to interpretation errors by the technologists, across the different study sites [11,12]. Based on the observations made during our study, specific factors were highlighted as the cause of Gram stain errors using the automated digital imaging with on-screen reading: 1) the recognition of microorganism morphologies on-screen can appear very different and more challenging to identify as compared to routine manual microscopy, 2) the nature of smear preparation, and 3) the thick smears with high cellular content are also especially challenging. To mitigate such errors, smears with inadequate material should be repeated in order to increase the number and the quality of fields examined in addition to the double review of the smears. While this approach might reduce error rates, the logistics appears arduous. Double review of smears can therefore be routinely performed only for a subset of specimens, focusing for example only on blood cultures and sterile fluids. Finally, the reporting and categorization of Gram stain errors by types of error and technologists can help revealing patterns, for targeted review or for additional training.

Conclusion
While the results reported in this study were not surprising given the subjective nature of Gram stains, they emphasize that on-screen reading is challenging even to experienced professionals; the laboratory technologists should therefore benefit from additional and specific training coupled to performance assessment. Additionally, the monitoring of Gramstain errors, which is facilitated by automated digital imaging, represents a crucial step in the process of improving the NPV negative predictive value, PPV positive predictive value quality of Gram stain results. Automated digital imaging of Gram-stained slides permits improved diagnostic workflow by facilitating the slides review and the exchange of information and by building educational picture libraries containing challenging smears that are the source of the most frequent errors. Finally, on-screen reading of digital images affords huge practical and ergonomic advantages as compared to the tedious manual microscopy and constitutes a useful complement to manual microscopy.
Code availability Not applicable.
Availability of data and material Not applicable.
Authors' contributions AF: performed the analysis and monitored, compiled, and analyzed the data. NA, LR, VB, MT, and GR: performed the analysis. NV and JS: reviewed the manuscript. AC: designed and conceptualized the study, supervision of the procedures, and validation of the data and wrote the manuscript. All authors contributed to the article and approved the submitted version.
Funding Open Access funding provided by Université de Genève. This study was performed by using internal funding.

Declarations
Ethics approval In accordance with local ethical committee, routine clinical laboratories of our institution may use biological sample leftovers for method development after irreversible anonymization of data. The official name of the ethics committee is "Commission cantonale d'éthique de la recherche (CCER)" https://www.hug-ge.ch/ethique

Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.