Background

Pulse oximeter measured oxygen saturation is a non-invasive approximation of arterial oxygen saturation (SpO2), which is considered the fifth vital sign in clinical assessment [1,2,3]. In clinical practice monitoring of SpO2 values is required to titrate oxygen therapy to avoid the risks of hypoxaemia and hyperoxaemia [1, 2].

Assessment of agreement between the gold standard arterial blood gas (ABG) measurement of oxygen saturation (SaO2) and SpO2 is essential for the interpretation and use of pulse oximetry values. It is also essential for the development of safe and practical recommendations for SpO2 targets for the titration of oxygen therapy. Overestimation of actual SaO2 may mean clinically relevant hypoxaemia is not detected or treated. Conversely, underestimation of actual SaO2 may result in unnecessary oxygen therapy with the associated risks of hyperoxaemia.

The United States regulatory body, the Food and Drug Administration (FDA) centre, requires the accuracy of pulse oximeters to be tested against SaO2, in healthy adults in laboratory settings [4]. In clinical practice a number of factors influence oximeter accuracy including the degree of hypoxaemia, hypercapnia, glycosylated haemoglobin (HbA1c), skin pigmentation, movement artefacts, peripheral perfusion and use of nail polish or acrylic nails [3, 5,6,7,8,9,10,11,12]. Clinical studies report that SpO2 can both over and underestimate SaO2, and the values may have wide limits of agreement [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32]. However, oximeter accuracy may also differ by oximeter model [7, 8, 12, 18, 19]. Manufacturers are continuously evolving sensor technology and software algorithms [3]. This means previous studies may not be directly relevant to current clinical practice because of the population groups and oximeter models used.

In our recent study investigating the accuracy of oximeters used in Australian and New Zealand Intensive Care Units (ICUs), we demonstrated a mean bias for SaO2 minus SpO2 of only 0.15%, with limits of agreement plus or minus 4.4% [18]. In this study we aim to investigate the agreement between SaO2 and SpO2 measurements by oximeters currently in use in Australian and New Zealand hospitals outside the critical care setting, either on the ward or in the Emergency (ED), High dependency Unit (HDU) or outpatient departments. Secondary objectives were to evaluate the diagnostic performance of SpO2 to detect hypoxaemia, and investigate factors affecting oximeter accuracy.

Methods

This multicentre prospective non-experimental observational study compared simultaneous SpO2 and SaO2 measurements in inpatients and outpatients at Westmead Hospital in Australia, and Wellington and Christchurch Hospitals in New Zealand. It was prospectively registered on the Australian and New Zealand Clinical Trials Registry (ACTRN12614001257651). Ethical approval was obtained from the Northern B Ethics Committee in New Zealand (14/NTB/115) and the Western Sydney Local Health District Human Research Ethics Committee in Australia (LNR/14/WMEAD/387).

Patients aged 16 years or older who were to have an ABG measurement as part of routine clinical care were recruited. Full written informed consent was provided in New Zealand by participants, or next of kin if participants were unable to (for example, if they were too unwell). Participants were not recruited if they had a diagnosis of sickle cell anaemia, methaemoglobinemia, carbon monoxide (CO) poisoning, or were previously recruited to the study and had paired SpO2 and SaO2 values successfully recorded. They could also be excluded for any other condition which, at the investigator’s discretion, was believed may present a safety risk or impact upon the feasibility of the study or the interpretation of the study results.

Participants were identified in hospital wards and outpatient clinics. Demographic data were recorded. Skin colour was assessed using the Fitzpatrick scale [33].

SpO2 was measured during a clinically indicated ABG. The oximeter probe was put in place for at least 10 s prior to the ABG, or longer if indicated by manufacturer’s instructions. SpO2 was measured from an earlobe or finger probe, depending on departmental policies and what the staff member responsible for performing oximetry would usually use to monitor that patient. If a finger probe was used it was placed on the index finger on the contra-lateral side to ABG sampling. Where possible, nail polish was removed before measurement.

The SpO2 value recorded was the value on the oximeter when blood was first observed to enter the ABG collection vial. If the participant was receiving supplementary oxygen at the time of the ABG, this was also recorded. Measurements paired with ABG samples subsequently identified to be venous or unusable, e.g. sample too small for analysis, were excluded. The models of oximeter and ABG analyser were recorded. Data recorded from the ABG were SaO2, partial pressure of oxygen (PaO2), partial pressure of carbon dioxide (PaCO2), Carboxyhaemoglobin (CoHb), Methaemoglobin (MetHb) and HbA1c, if measured as part of clinical practice. Investigators were asked to record whether they had any concerns with oximeter accuracy, such as nail polish that was not removed, poor oximeter signal, or patient movement. Participants in which there was a reported concern with oximeter accuracy were not excluded from analyses.

Bland Altman plots and estimation of bias and limits of agreement were used to describe the agreement between SpO2 and SaO2 measurement, using SaO2 as the reference standard.

The diagnostic performance of SpO2 < 90% to detect hypoxaemia, defined as a SaO2 < 90% and defined as a PaO2 < 60 mmHg, was evaluated using contingency tables, with sensitivities and specificities estimated by an exact binomial method for proportions. A post hoc analysis of the ability for SaO2 < 90% to detect a PaO2 < 60 mmHg was performed using the same methods.

Associations with mean bias were illustrated by a scatter plot with a scatter plot smoother and a Spearman rank-correlation coefficient for SaO2, and ANOVA for categorical variables in Table 1. The mean difference between categories was assessed with an F-test. Where a categorical variable only had one observation it was not used in the ANOVA. If important predictors of bias were identified, it was planned to use Bland Altman methods determine whether there was also an effect on limits of agreement.

Table 1 Categorical factors assessed for influence on oximeter accuracy

To estimate the difference between SpO2 and SaO2 due to different oximetry devices, estimation of variance components and associated intra-class correlation coefficients for the effect of oximeters as well as best linear unbiased predictors of the effect of individual oximeters were assessed by mixed linear models and estimation by restricted maximum likelihood.

SAS version 9.4 was used.

The planned sample size of 400 was based on three considerations. Firstly, for the analysis of variables that predict the size of the bias we sought to have between 20 and 40 participants for each degree of freedom in the ANOVA. Based on the six variables, some of which have multiple levels, this required between 200 and 400 participants. Secondly the estimates of paired SD for the SpO2 to SaO2 difference from patients in a range of clinical settings were 0.55% [6], 2.1% [17], and 2.2% [16]. There is 80% power, with a type I error rate of 5%, to detect a SpO2 to SaO2 difference of 2% for any of the variables that might predict bias, if there were two equal sized groups of 21 participants. For estimation of variance of components for the different pulse oximeters by Best Unbiased Linear Predictors between 20 and 25 participants per oximeter brand were required and it was estimated that between 10 and 20 oximeter brands would be used.

Results

Participants

Four-hundred patients were recruited; 253 from Christchurch, 103 from Wellington and 44 from Westmead Hospital (Fig. 1). Participant characteristics and details of the pulse oximeters and ABG analysers are presented in Table 2.

Fig. 1
figure 1

Flow of participants through the study

Table 2 Participant characteristics (N = 400)*

Agreement between SpO2 and SaO2

The bias for SpO2 minus SaO2 was − 1.2%, with limits of agreement − 4.4 to 2.0%. The Bland Altman plot is shown in Fig. 2. In 10/400 (2.5%) participants the SpO2 was at least 4% lower than SaO2. In one of these participants the investigator reported concern with oximeter accuracy. In 3/400 (0.8%) participants the SpO2 was at least 4% higher than the SaO2. In one of these participants the investigator reported concern with oximeter accuracy. Characteristics of these participants are in the Online Additional file 1: Table S2).

Fig. 2
figure 2

Bland Altman Plot for SpO2 versus SaO2

SaO2: Oxygen saturation measured by arterial blood gas sample, SpO2: Oxygen saturation measured by standard pulse oximeter. The solid reference line is the mean bias and the dashed reference lines are the limits of agreement around this mean bias.

Detection of hypoxaemia

Sensitivity and specificity for the ability of SpO2 < 90% or < 92% to detect SaO2 < 90%, the ability for SpO2 < 90% to detect PaO2 < 60 mmHg, and the ability for SaO2 < 90% to detect PaO2 < 60 mmHg, are shown in Table 3. The ROC curve for SpO2 to detect SaO2 < 90% is shown in Fig. 3. SpO2 < 92% had 100% sensitivity and 84.4% specificity for detecting SaO2 < 90%, and 95.1% sensitivity and 90.0% specificity for detecting PaO2 < 60 mmHg. See the Online Additional File for tabulated values and ROC curve (Additional file 1: Tables S3 and S4, Additional file 1: Figure S1). Participants tended to sit to the left of the predicted oxygen haemoglobin dissociation curve (Online Additional file 1: Figure S2) [34]. In 13/400 (3%) of participants the PaO2 was > 100 mmHg. Twelve of these participants had a SpO2 > 96%. One had an oximetry value of 96%; their PaO2 was 142 mmHg and SaO2 was 99%.

Table 3 Diagnostic performance of SpO2 and SaO2
Fig. 3
figure 3

ROC curve for SpO2 to predict SaO2 < 90%. The c-statistic for the logistic regression, representing the area under the ROC curve, was 0.986. SaO2: Oxygen saturation measured by arterial blood gas sample, SpO2: Oxygen saturation measured by standard pulse oximeter

Factors potentially influencing oximeter accuracy

There was no statistical evidence of an association between SaO2 and bias between SpO2 and SaO2; Spearman coefficient 0.003, P = 0.94. Of the other factors from Table 1, only a diagnosis of diabetes was identified as a predictor of bias (P = 0.05). In diabetics it was − 0.8 (95% limits of agreement − 4.4 to 2.8), in non-diabetics it was − 1.2 (− 4.4 to 2.0). Detailed results are presented in the Online Additional File (Additional file 1: Figure S3 and Table S5).

There were at least 14 different oximeter models used. The most common oximeter models used were the Nonin Avant 9700 in 103 participants (26%), Massimo Rainbow Radical 7 in 92 participants (23%) and the Nonin Avant 4000 in 76 participants (19%) (See Additional file 1: Table S1 for all models). The difference in the estimation of variance components was 0.16 for oximeter brand and 2.48 for residual, resulting in an intra-class correlation coefficient of 0.94. This can be interpreted as approximately 6% of variation in the relationship between SpO2 versus SaO2 being due to oximeter brand. Detailed results by oximeter are shown in the Online Additional file 1: Table S6).

Concern with oximeter accuracy was reported by investigators in 16 patients, nine of which had nail polish, acrylic nail or double nail. Other causes for concern are presented in the Online Additional file 1: Table S1).

Discussion

The bias and limits of agreement between SpO2 and SaO2 suggest that pulse oximetry is an accurate method to assess SaO2 in most adult patients in the clinical setting. However, in a small number of participants potentially clinically important differences between SpO2 and SaO2 could affect patient assessment and management. A practical guide that can be derived from these data is that a SpO2 ≥ 92% effectively rules out presence of hypoxaemia, indicated by a SaO2 < 90%. There were no clinically significant differences in oximeter accuracy based on absolute level of SaO2, hospital location, numerous clinical characteristics or oximeter brand.

The magnitude of bias and associated limits of agreement from the range of oximeters in this study suggested that overall they perform at a similar level or better than oximeters used in many of the clinical studies performed in the last 10 years [5, 6, 8, 10,11,12, 18,19,20,21,22,23,24,25,26, 28, 30, 31]. This is in keeping with constant oximeter sensor technology and software improvements by manufacturers over time [3]. Specifically, the bias and limits of agreement for SaO2 minus SpO2 were similar to the values recently obtained in critically unwell patients in the ICU setting (0.15%, limits of agreement plus or minus 4.4%) [18].

The negative bias of − 1.2%, albeit small, meant that the oximeters tended to underestimate SaO2. Such underestimation has the potential to result in a conservative estimate of risk of hypoxaemia and may lead to more liberal oxygen therapy than required. SpO2 underestimated SaO2 by at least 4% in around 3% of participants, and overestimated it by at least 4% in less than 1% of participants. These findings mean that while the oximeters performed well overall, there were still potentially clinically relevant differences in SpO2 and SaO2 in a small proportion of the participants. In the majority of the participants with SpO2 and SaO2 values differing by at least 4% the investigators did not state they had any concerns with oximeter accuracy. This highlights the potential difficulty in identifying when an oximetry value is incorrect and emphasises the importance of guideline recommendations to consider oximetry values in clinical context [3].

The TSANZ [2] and BTS [1] guidelines for acute oxygen therapy both recommend use of pulse oximetry as a vital sign and tool to titrate oxygen therapy to a target oxygen saturation range. The TSANZ recommend oxygen is delivered to a SpO2 target range of 92 to 96% in patients not at risk of hypercapnic respiratory failure [2]. This range was developed to reduce the risks of both hyperoxaemia and hypoxaemia, while recognising potential oximeter accuracy limitations [35]. The lower limit of 92% is supported by a SpO2 saturation of ≥92% indicating that hypoxaemia (SaO2 < 90%) is not present. The recommended upper SpO2 limit of 96%, aimed at avoiding hyperoxaemia, is supported by the finding that 12 of the 13 participants with a PaO2 of greater than 100 mmHg had a SpO2 value over 96%.

A SpO2 < 90% had a specificity of only 70.5% in identifying a PaO2 < 60 mmHg, while for SaO2 < 90% it was only 54.1%. These values are in keeping with the majority of participants being positioned to the left of the predicted oxygen haemoglobin dissociation curve. In keeping with recommendations by the TSANZ Oximetry Guidelines [3], these findings highlight the limitations of estimating PaO2 from saturation values, and vice versa.

Patients with sickle cell anaemia, methaemoglobinemia, or CO poisoning were excluded from the study and nail polish was removed where possible as these factors are well established to impact on oximeter results [3]. SaO2, oximeter model and the numerous clinical variables were not found to significantly impact on oximeter accuracy. However, it was not possible to evaluate the effect of earlobe oximetry, Fitzpatrick scale V or VI, or ED location on accuracy due to there being only one participant in each of these categories.

This study had the advantage of a multicentre design and use of a range of oximeters routinely available to clinical staff in a variety of hospital settings. A wide range of adult patients were included, both in terms of presenting diagnosis and illness severity. While there were a range of SaO2 values between 72 and 100%, the results cannot be applied to patients with a SaO2 of under 70%, at which oximeter inaccuracy is well recognised [3]. Results may not be applicable to paediatric patients or adult patients in theatre, ICU or ED, especially as a variety of factors specific to these patients have been previously identified as affecting oximeter accuracy [11, 15, 17, 25, 27, 28, 30, 31]. Having only one participant with a Fitzpatrick score of V, and none with VI, meant study findings may not be applicable to patients with higher skin pigmentation. This is especially important as oximeter accuracy has been demonstrated to decrease as pigmentation increases, particularly at lower SaO2 levels and in oximeters of the same brand as some of those used in our study (Massimo Radical and Nonin 9700) [7].

Single oximeter and ABG measurement pairing from each participant were used, which has the advantage of removing potential bias from repeated measures in the same participant. However, this did mean we could not specifically assess the accuracy of SpO2 to detect changes in SaO2 over time.

Conclusions

Overall, the oximeters in this study had good accuracy in determining individual SaO2 values and detecting hypoxaemia in a range of clinical settings. The use of a SpO2 of 92% as the lower boundary for the titration of oxygen therapy was supported by 100% sensitivity for SpO2 < 92% in identifying hypoxaemia (SaO2 < 90%). In a small number of participants discrepancies between SpO2 and SaO2 could have implications for patient assessment and management. This highlights the importance interpreting SpO2 within clinical context.