Strongly influenced by the publication of “to err is human”, awareness regarding the need for improving on patient safety has increased for the last two decades [1]. This has led to deliberations on how to actually improve quality and safety, as described in the Institute of Medicine report “Improving diagnosis in health care” [2]. In particular, it has been shown that the laboratory support has been critical in establishing a safe and efficient diagnostic process, potentially influencing the majority of clinical decision-making while accounting for only about 2% of direct healthcare cost [3, 4]. During the recent months of the SARS-CoV‑2 pandemic, laboratory support has been critical in supporting “rapid and effective contact tracing, implementation of infection prevention and control measures in accordance with national recommendations, and adequate support to the patient” [5].

Altogether, laboratory-based diagnostics appear to be an effective and efficient tool albeit not always a standardized one [6,7,8]. Benchmarking overall diagnostic laboratory performance meets the same challenges as harmonizing external quality assessment schemes, but challenges may even be more basic. Indeed, apart from the Q‑Probes program of the American College of Pathologists and the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) Working Group on Laboratory Errors and Patient Safety (WG-LEPS) there exist little strategic efforts to benchmark and thus support standardization regarding the level of quality or efficiency in diagnostic laboratories [9,10,11,12].

The interest in standardization and quality indicators has been increasing over the last years, even though the actual number of laboratories participating in quality indicator schemes appears to be stagnant. While most current initiatives, e.g. regarding execution of the IFCC scheme, still appear to be on a national basis, ongoing efforts have slowly born fruit [13, 14]. The overall picture has led to the coining of the term “quality indicator paradox”, suggesting that there might be a hiatus between the interest of laboratories to improve on the three concepts of efficiency, quality, and patient safety and asking how to speed up the process [15].

Before estimating the relation between laboratory operations (one aspect being, e.g., quality) and patient safety, standards of measurement need to be established for both domains. But because laboratory management methods have traditionally been developed rather hands-on and have not yet been well-established in the academic literature, the number of clearly defined and commonly used key performance indicators is limited [16]. There are, however, a few that appear to be widely used with at least similar definitions of measures. Examples for the latter are grand totals (such as number of patients, number of orders, number of samples), temporal measures (various measures for turn-around times), and resource measures (such as number of full-time equivalents and laboratory space available).

The literature on laboratory performance benchmarking, accordingly, is sparse. Data for the Asia-Pacific region have been published [17], but there exist no comparable publications for Europe in general and Germany, Austria, and Switzerland in particular. Due to the heterogeneity of healthcare systems and educational background of laboratory professionals in Europe, direct generalizability of estimates may further be limited [18]. Thus the need arose for a common estimator for laboratory performance with similar data gathering procedures in different countries as a basis for safe and efficient health care.

In the face of this rather unconstrained situation, the research question of this study was twofold. First, which measures might be in use on a large enough scale to be able to compare laboratories in Europe and second, which results might a benchmark pilot survey yield for Germany, Austrian, and Switzerland?

Material and methods

Development of questionnaire

Since there existed no published questionnaires for laboratory benchmarking, a new one needed to be designed for this study. The design process was iterative and took into consideration information that could be extracted from the literature, author experience and feedback from informal focus groups of laboratory professionals using a total of three iterations. Focus groups were composed of about 20 people each, including medical doctors, technicians, workflow experts, biologists, and laboratory directors, and varied slightly due to availability of participants. The resulting questionnaire consisted of 50 items in total, either relating to general information or more specifically addressing the topics of “operational performance”, “integrated clinical care performance”, and “financial sustainability”.

Enrolment of participants

The population under consideration consisted of laboratories that had a broad range of diagnostic providers, i.e. Abbott, Roche, Siemens, Sysmex, Beckman, Werfen, Biomerieux, Becton Dickinson, Stago, and Diasorin, among others. They were contacted by Abbott customer representatives and asked whether they would participate in the study. More specifically, Abbott representatives in Germany, Austria, and Switzerland were asked to approach general medical laboratories both with and without Abbott equipment. As it was a voluntary survey there may be some bias towards laboratories using Abbott equipment; however, the items in the survey were not specific to the use of Abbott products. The questionnaire was then filled out online using the platform SurveyMonkey with support of Abbott customer representatives. The survey was completed in paper form and subsequently entered manually only where filling out online was not directly possible.

Statistical methods

Results generally are presented at least as numbers and percentages for nominal scale, median and interquartile range (IQR) for ordinal scale and mean and standard deviation for higher measurement levels. Ordinary least squares regression and corresponding 95% confidence intervals were used to quantify the relations between subscale “operational performance” and other subscales. Any p-values given are to be considered exploratory for this pilot study. The χ2-test or Fisher’s exact test were applied as appropriate, otherwise the test used is specified in the text. Statistics and visualization were performed using the free software environment R version 3.6.3 [19].

Scoring of questionnaire items and building of subscales

To examine the relation between operational performance, integrated clinical care performance, and financial sustainability, relevant items were arithmetically combined to form subscales of the questionnaire: each item was normalized to the range of 0–100, then the mean was taken for the items contained in each subscale. For operational performance, these were “use of IT functionalities”, “percent of orders in electronic form”, “percent of results autovalidated”, “number of key performance indicators (KPIs) measured”, “number of samples processed per day”, and “advanced TAT monitoring”. For integrated clinical care performance, these were “frequency of interaction with diagnostic subcommittees and physicians”, “process in place to control appropriate testing”, “interaction with diagnostic subcommittees”, “measuring the lab’s impact on patient outcomes”, and “scope of service provided to physicians”. For financial sustainability these were “shifts per day”, “labor situation” (referring to the possible challenge of staff shortage), “staff productivity” calculated as samples per full-time equivalent (FTE) and tests per FTE, “productivity by analyzer” calculated as tests per analyzer, and “workspace utilization” calculated as samples per m2 and tests per m2.


Overall, 65 Laboratories responded to the survey in Germany (42), Austria (17), and Switzerland (6). About two thirds (42) of the responding laboratories were hospital laboratories and one third (23) were commercial laboratories. About one third of hospital laboratories (14) and two thirds of commercial laboratories (15) processed more than 1000 patients per day. Hospital laboratories served on average about 1000 patients per day, while commercial laboratories served more than twice this number (about 2300 patients per day) and processed sometimes more than 6000 primary tubes in clinical chemistry alone. Detailed results beyond those presented in the body of this manuscript can be found in the online supplement. Formal quality management systems were established in little more than half of the laboratories (35; 54%), with certifications/accreditations other than ISO 9000 or ISO 15189 playing almost no role whatsoever. Only one laboratory was certified according to ISO 14000, no laboratory at all used JCIA (Joint Commission International Accreditation), CAP (College of American Pathologists) or MLE (Medical Laboratory Evaluation). Informal quality improvement programs, on the other hand, were widely used. Most dominant were survey-based approaches (employee satisfaction survey, clinician satisfaction survey, patient satisfaction survey, in that order) and continuous development programs for employees. Again, there was a lower, in this case even much lower, use of formalized approaches, like LEAN Six Sigma or Activity Based Costing (ABC).

Operational performance

Besides turn-around time, which was represented by several items, 10 KPIs were assessed in the survey. These were “Employee productivity (e.g. samples per full time equivalent)”, “Workspace utilization (e.g. tests per square meter)”, “Amount of implemented auto-validation rules”, “Reduction of consumable waste”, “Reduction of expired reagent stock”, “Return on investment (e.g. total cost of ownership, total value of ownership)”, “Instrument noise levels”, “Systems uptime/downtime”, “Rerun rates”, and “Blood smear review rates”. For detailed results see Online Supplement, only the five more commonly used KPIs are discussed here.

Metrics used to monitor operational performance varied widely. Besides turn-around time, systems uptime/downtime was used as a KPI most commonly, with instrument noise levels and expired reagent stock following as close second and third, respectively. Rerun rates and blood smear reviews were used by about a third of the laboratories. The use of all other KPIs listed in the survey was positively indicated by 25% or less of the laboratories. Turn-around time was identified as a KPI by about two thirds of hospital laboratories and slightly more than half of commercial laboratories. The various types of TAT examined in this survey can be identified in Fig. 1, with “Lab TAT” dominating and “Pre-Lab TAT” trailing. Only 14% of laboratories measured “Brain-to-Brain TAT” or “Clinician Expectation Time”, defined as the time between physician order and physician access to laboratory results [20, 21].

Fig. 1
figure 1

Types of TAT, using the example of phlebotomy (turn-around times for other specimens can be defined in a similar manner)

While TAT was defined as a KPI by the majority of laboratories answering this survey, continuous monitoring or frequent review did not appear to be the main concern. Only about a third of laboratories monitored TAT real-time, about 20% each daily and/or weekly. This may partly be due to the software used to monitor TAT: about 80% of hospital laboratories monitoring TAT only use spreadsheet software like Excel, this proportion still being 50% for commercial laboratories. Comprehensive middleware IT solutions or dedicated TAT monitoring software is thus used for only 20% of hospital laboratories and 50% of commercial laboratories monitoring TAT.

Consistent with the relatively low use of KPIs in general and various versions of TAT in particular, less than a third of laboratories overall have a fully electronic order process, this number being lower than 10% for commercial laboratories. Apart from the functionalities commonly implemented in the laboratory information system (LIS), like result reporting, age/gender-related rules, and basic statistical reporting, advanced use of IT functionality is not widely practiced. This extends to features like autovalidation, which are used by less than 20% of hospital laboratories and less than 40% of commercial laboratories.

Digitalization and mechanical automation appeared to go hand in hand up to a certain degree. While all laboratories processing more than 2000 samples per day used at least preanalytical automation, less than half of the laboratories processing 1000 samples or less per day used preanalytical automation. On the other hand, roughly 75% of laboratories used integrated instruments for clinical chemistry and immunoassays, irrespective of whether they were hospital or commercial laboratories. Track connection was established for the most frequently performed types of analyses (clinical chemistry, immunoassays, and hematology) for only about a quarter of laboratories.

Integrated clinical care

More than 90% of laboratories provide services above and beyond measurements to physicians. Among these are (in descending order) alerts, interpretations and proactive consultations, reflexive test suggestions, as well as diagnostic pathway guidance. From the six services possible, only real time decision support appeared to be applied by less than two thirds of laboratories.

Interestingly, only two thirds of laboratories regularly review guidance given for diagnostic pathways, half of which annually, the other half more frequently. Correspondingly, only about 75% of laboratories participate in various diagnostic committees. A similar proportion of hospital laboratories performed physician satisfaction surveys, while only half of commercial laboratories were active in this direction.

In contrast to the relatively high level of activity regarding classical laboratory/physician-interaction, levels of activity were much lower for more recent areas of laboratory professional activity. Among the latter are test utilization management, with less than 20% of laboratories actively engaged, direct measurement of patient outcomes, and combining data digitally with diagnostic statements from other disciplines. Interestingly, activity was highest in the last of the three areas specified, with commercial laboratories leading the way (26%) and hospital laboratories aiming to follow (18% currently, 8% planning relevant activity in the next 12 months).

Financial sustainability

About one third to two thirds of laboratories in the survey faced staff shortages, the latter being more pronounced for higher skilled staff compared to lower skilled staff. Also, the shortage was more pronounced in the commercial than in the hospital sector, the former reporting a major staff shortage in 39% of cases while the latter only in 11% of cases. For lower skilled staff, there was almost no major shortage reported in hospital laboratories, while 13% of commercial laboratories reported a major shortage.

Staff productivity, on the other hand, was higher in the commercial sector compared to the hospital sector. In commercial laboratories about 2700 tests were performed per FTE, where this number was less than half in hospital laboratories. The number of tests performed per sample was in both cases around six, the number being slightly higher in commercial laboratories (6.5) as compared to hospital laboratories (5.9).

Automation was associated with higher efficiency in both staff productivity as well as workspace utilization. Laboratories with a high degree of automation executed about 200 tests per m2, with laboratories with a low degree of automation achieved only about half that number. For staff productivity, the association was even stronger with 2600 tests being achieved per FTE in a high-automation environment and only slightly more than 1000 tests per FTE in a low-automation environment.

Subscales and correlations

A significant correlation (Pearson’s r = 0.37, p = 0.002) between the operational performance and integrated clinical care performance subscales was identified. Individual observations can be found in Fig. 2, where hospital laboratories are shown in black, and commercial laboratories are shown in grey. No significant interaction effect between this relation and type of laboratory could be identified (Supplement Fig. 4).

Fig. 2
figure 2

Scatterplot of subscales Integrated Clinical Care Performance vs. Operational Performance in arbitrary units (A.U.) (shaded area corresponds to 95% confidence interval for ordinary least squares regression)

Only 34 out of 65 of laboratories (52%) were able to provide sufficient data to estimate the financial sustainability subscale. Due to this lower de facto sample size, correlation between the operational excellence and financial sustainability subscales, even though of similar magnitude as the correlation between the operational performance and integrated clinical care performance subscales, was significant but would not withstand multiple comparison correction (Person’s r = 0.36, p = 0.037; Fig. 3).

Fig. 3
figure 3

Scatterplot of Subscales Financial Sustainability vs. Operational Performance in arbitrary units (A.U.) (shaded area corresponds to 95% confidence interval for ordinary least squares regression)

In contrast to financial sustainability subscale and similar to the operational performance subscale, all laboratories surveyed were able to provide sufficient data to estimate the integrated clinical care performance subscale score. No significant correlation could be identified between the integrated clinical care performance and financial sustainability subscales (Supplement Fig. 5). No significant difference (t-test) could be identified between hospital laboratories and commercial laboratories in any of the scales.


Diagnostic laboratories use reference ranges to interpret individual patient results on a daily basis. Indeed, it is generally accepted that for adequate interpretation of laboratory results longitudinal and in particular transversal comparison is indispensable [22]. This begs the question why there is such a dearth of benchmarking data for more general measures of diagnostic laboratory performance. One answer may be that data for adequate transversal comparison of laboratories is simply much harder to get than individual patient data [16]. Even for, in principle, relatively simple comparisons like that of external quality assessment (EQA) programs, for several reasons it appears to be difficult to consistently achieve participation of laboratories bar legal obligation [23]. All the more interesting appear benchmarking data of diagnostic laboratories willingly sharing information regarding their performance in Europe, and for this pilot study in the region of Germany, Austria, and Switzerland in particular.

The main results of this study can be categorized in generalities as “common knowledge” of the diagnostic laboratory community, and quantitatively—in some cases even qualitatively—interesting specificities. Among the former are the order and sample volume distributions of laboratories, the heterogeneity of KPIs in use, and the relatively low representation of fully electronic ordering processes. Among the latter are the relative lack of formalized quality management systems, the low use of digitalization in general and autovalidation rules in particular (even in commercial laboratories), and the almost nonexistent measurement of patient outcomes. While some of the parameters examined may primarily drive financial outcomes, the need for stronger involvement of the laboratory in the entire diagnosis cycle and therapeutic pathway as an ethical imperative appears evident.

Which trajectories to improve laboratory performance and thus, hopefully, quality and patient safety might be deduced from this study? Quality has traditionally been the main focus of diagnostic laboratories (to the degree that no result is better than the wrong result), underlined by the high presence of informal (quality) improvement programs evident in this survey [24]. Despite that, however, diagnostic performance is variable and time-tested quality management systems, like ISO 9001, have not yet taken hold in all diagnostic laboratories [25]. One key lesson might thus be to further advance the presence of formal certifications like ISO 9001 or accreditations like ISO 15189 in the laboratory community. Furthermore, one might ponder the possibility of introducing more frequent and informative EQA participation aided by, e.g., building public databases of successful EQA participation of individual laboratories.

Successful implementation of ideas as presented above depends on highly skilled and motivated laboratory personnel. This is in line with the tendency towards not only certification but also academization of laboratory personnel and thus maybe laboratory operations in general. Unfortunately, labor shortages are becoming the norm in the diagnostic laboratory community and have also been found in this survey. Specifically, in contrast to the demand for unskilled labor that appears to be largely satisfied, there is a high and increasing gap in demand and supply for skilled labor. Avenues to remedy this situation were not explored by the survey but bringing into focus the added value of laboratory activities for patients and medical personnel alike might just either motivate employees to upgrade their skill levels or even attract more young people to the laboratory profession [26].

Concordantly, time may be ripe to start discussions in the medical community and with insurance providers whether data enrichment in and additional service provided by the laboratory might help provide better overall value of the entire diagnostic and therapeutic process at lower overall cost, and thus merit additional remuneration for the laboratory. For example, since the typical cost of laboratory service amounts to only about 2% of total healthcare expenditures, there is tremendous opportunity to optimize efficiency in the upstream and downstream (i.e. preanalytic and postanalytic) phase of the total testing process (cf. [27]). The first step for this would be to add value to the services provided to physicians, e.g. by provision of as yet not broadly provided services like reflexive test suggestions, proactive consultation, real-time decision support, and diagnostic pathway guidance. Only about two thirds of laboratories on average provide these services, so there is considerable room for improvement using modern clinical decision support tools or even with existing technology.

To elaborate on existing technology, this survey has shown that some time-tested ways to optimize efficiency, and sometimes even quality, are employed by only the minority of diagnostic laboratories. In particular, autovalidation rules are employed by less than a third of laboratories overall. Various studies in other fields than that of laboratory diagnostics have generated some evidence that neither humans nor computers alone might best deal with complex challenges, but rather combinations thereof [28]. Indeed, strengths of humans and computers might be augmented, and weaknesses be offset by each other thus allowing for safe services in high-throughput environments [29, 30]. In this sense, not yet widely used autovalidation has the potential to significantly improve on patient safety by more specifically targeting human attention [31, 32].

The main limitations of this study originate in the data generating process. The questionnaire was newly designed, it has not yet been validated, and thus the typical measurement quality criteria of objectivity, reliability, and validity are not necessarily fulfilled. Response behavior might follow social desirability and some items may not be interpreted in the same way by all participants (e.g. real time decision support). Furthermore, selection bias might limit generalizability of results to other laboratories in Europe or even in the region of Germany, Austria, and Switzerland. As an example, one might list the correlation of formal quality management system establishment with laboratory size (cf. section Q9 in the online supplement). Another example might be the lower staff efficiency in hospital laboratories, which is probably due to the existence of night shifts and a higher proportion of specialized analytics. On the other hand, generating a rough estimate of the status quo is the first step towards validating a questionnaire for future studies aiming to generate even more objective, reliable, and valid benchmarks for laboratory performance (cf. [33]).

To conclude, this study for the first time presents a glimpse into recent strategies for performance optimization of laboratories in Germany, Austria, and Switzerland. Much remains to be examined, but the need for further digitalization and automatization appears strong as ever. To optimize the effect of various interventions with respect to patient safety and diagnostic as well as therapeutic quality, the continuous establishing and updating of a reliable set of benchmarks is of prime importance. Future studies are needed to formally validate the questionnaire used in this study and develop benchmarking on a larger scale.