1 Introduction

The effect of skin color on the accuracy of oxygen saturation measured by pulse oximetry (SpO2) has been a topic of discussion for many years [1, 2]. The tendency of some pulse oximeters to overestimate arterial oxygen saturation (SaO2) in healthy dark-skinned volunteer subjects was first reported by Bickler et al. in 2005 [1]. The same investigators later found that this positive bias (tendency of SpO2 to overestimate SaO2) was greatest at SaO2 values below 80%, suggesting an increased risk of missed hypoxemic events [2].

Several recent publications have raised new questions regarding the accuracy of present-day pulse oximeters in different races. Sjoding et al. performed a retrospective, multi-center study of pulse oximeters in ICU patients [3]. They concluded that, “in two large cohorts, Black patients had nearly three times the frequency of occult hypoxemia that was not detected by pulse oximetry as White patients.” As with nearly all the recent reports, Sjoding et al. pooled data from multiple pulse oximeter manufacturers, generalizing claims on pulse oximetry without respect to manufacturer, model, or sensor type. In a more recent clinical study in COVID-19 patients, Crooks et al. found that in the 85–89% SaO2 range, their pulse oximeters overestimated SaO2 by an average of 5.8% in Asians, 3.9% in Blacks, and 2.4% in Whites [4]. In another retrospective clinical study, Burnett et al. found that in 151,000 paired readings of SpO2 versus SaO2, the pulse oximeters showed an incidence of occult hypoxemia (i.e., missed hypoxemic events) of 2.1% in Blacks, 1.8% in Hispanics, and 1.1% in Whites [5]. Recent published studies by Fawzy et al. and by Chesley et al. yielded similar results and conclusions regarding racial differences in occult hypoxemia [6, 7].

Masimo has accounted for the potential confounder of skin pigmentation on pulse oximetry measurements [8], and designed Masimo SET® (Signal Extraction Technology®) using a unique signal-processing method and other engineered solutions to minimize the impact of skin pigmentation and other common pulse oximetry confounders, including motion and low perfusion. In view of the recent concerns regarding pulse oximetry accuracy differences between Black and White patients, we conducted a retrospective laboratory evaluation to assess the accuracy of Masimo SET® pulse oximetry with RD SET® sensors on healthy Black and White volunteers undergoing controlled desaturation studies with SaO2 values ranging between 100 and 70%.

2 Methods

Human volunteer desaturation data collected in the Masimo laboratory between September 2015 and July 2021 were retrospectively evaluated. The protocol underwent review and approval was granted by the Institutional Review Board of Ethical & Independent (E&I) Review Services (Lee’s Summit, MO). Written informed consent was obtained from each subject prior to enrollment. All subjects met inclusion criteria to participate, including a thorough health history screening assessment specifically targeting any systemic diseases or current health conditions that could increase study risk. A complete list of exclusion criteria, including screened health conditions, is provided in Appendix 1. Consequently, enrolled subjects were classified as either ASA I ("a normal healthy subject") or ASA II ("a subject with mild systemic disease") on the ASA Physical Status Classification System. Some subjects were involved in previous studies involving healthy volunteers in the Masimo laboratory, which were not related to SpO2. A few subjects were also involved in previous SpO2 studies, not related to this study, and those data points were not included in this dataset. None of the subjects in this study participated in prior Masimo SET® calibration studies, and none of the prior studies have been published.

The data include 7183 paired samples (3201 Black and 3982 White) obtained from 75 subjects (39 Black and 36 White) who self-identified as being either racially Black or White. With 75 subjects measured, there were total 87 visits, with five subjects in each group having two or three repeat visits. Since there were long intervals between the repeat visits, the study data from multiple visits was considered independent, and none of the visits were excluded from analysis. Demographic data are summarized in Table 1, including: age, sex, race, baseline total hemoglobin (tHb), Carboxyhemoglobin (COHb), Methemoglobin (MetHb), and Massey Scale, a validated index of skin tone [9].

Table 1 Demographic data for Black and White subjects, as well as combined data

Subjects who self-identified as Black had Massey Scale values ranging from 4–9, while those who self-identified as White had Massey Scale values of 1–4. The Massey Scale values are shown in Table 1, and graphically in Fig. 1.

Fig. 1
figure 1

Massey Scale skin color versus number of subjects for self-identified White (blue tint boxes) and Black (salmon tint boxes) subjects

The SpO2 values obtained from Masimo SET® pulse oximeters (MX technology boards) with RD-SET® sensors (Masimo, Irvine, California) were time-matched to be recorded simultaneously with the arterial blood gas (ABG) samples obtained from a radial arterial cannula, and analyzed on a Radiometer ABL-835 Flex CO-Oximeter (Radiometer Inc., Brea, California), which was calibrated daily before use. The ABG samples were collected and handled in accordance with the guidelines provided by the blood gas analyzer manufacturer [10, 11].

The subjects were exposed to a desaturation protocol that sequentially decreased the SpO2 in a stepwise fashion, achieving six stable plateau values between 100 and 70%, while recording simultaneous SaO2 readings. Multiple replicates were obtained from subjects at each plateau depending on stability of the SpO2 reference, and subject safety and comfort. The protocol was consistent with the ISO 80601–2-61 pulse oximetry standard. Data were grouped by self-declared race (Black vs White) and analyzed separately for each group. There was a median of 72 samples per subject for the Black population and 96 samples per subject for the White population.

Statistical calculations included values of bias (mean difference of SpO2-SaO2), precision (standard deviation [SD] of the difference), and accuracy (root-mean-square error [ARMS]). The distribution of the differences did not pass the Kolmogorov–Smirnov normality test, so a non-parametric Wilcoxon Rank Sum test was used to compare median values. Linear regression slope and intercept were calculated, along with the standard error estimate (SEE) for each group. In addition, the rate of occult hypoxemia (SaO2 < 88% when SpO2 = 92–96%) was determined for each self-declared race (Black and White).

3 Results

Figure 2 shows “scatterplots” of SpO2 versus CO-oximeter SaO2 for White and Black subjects. The solid lines show linear regression best-fits, and the dotted lines indicate the SEE limits. There are no significant visible differences between the two scatterplots. Statistical values for the two groups, including bias, precision, root-mean-square error, and numbers of data points are shown in Table 2. Bias and precision are − 0.2 ± 1.40% for Black subjects, and − 0.05 ± 1.35% for White subjects. Based on the Wilcoxon Rank Sum test, the median difference of − 0.2 was statistically significant (p < 0.001). All these error values are well within manufacturer accuracy specifications.

Fig. 2
figure 2

Scatter plot (SpO2 versus SaO2) along with performance metrics for White subjects (Fig. 2a) vs. Black subjects (Fig. 2b)

Table 2 Tabulated summary of performance statistics for Black, White and combined datasets

To directly compare and contrast these data with the published results of Sjoding et al. [3], we plotted SaO2 versus SpO2 in the same manner as provided by those authors (Fig. 3). The box plot graphing protocol includes a horizontal line within each box representing the median; the top and bottom of each box represent the upper and lower limits of the interquartile range, and the whiskers represent 1.5 times the interquartile range. The Sjoding et al. data [3] (reproduced with permission) are depicted in Fig. 3a, whereas the Masimo data points from this study are plotted in Fig. 3b. Sjoding et al. state that true SaO2 values less than 88% combined with SpO2 values of 92–96% represent “occult hypoxemia” not detected by the pulse oximeter. Their plot (Fig. 3a) shows this to be more common in Black than White patients (reported as 11.7% and 3.6%, respectively). Our Masimo SET® data (Fig. 3b) do not demonstrate this tendency, as occult hypoxemia is calculated as 0% and 0.2% for Black and White subjects, respectively. The mean carboxyhemoglobin (COHb) was 1.1% for Black and 0.9% for White subjects, and mean Methemoglobin (MetHb) values were 1.1% for all subjects; COHb and MetHb values were < 1.9% for all subjects.

Fig. 3
figure 3

Box plots showing accuracy comparison for Black (salmon tint) and White (blue tint) ethnic groups binned by 1% saturation bins, from 89–96%. Figure 3a reproduced (with permission) from the Sjoding et al. study [12], and Fig. 3b similar box-plot configuration using Masimo SET® data pairs from the current laboratory study. Shaded area of SaO2%

4 Discussion

The data demonstrate clinically equivalent performance of Masimo SET® pulse oximeters with RD SET® sensors for healthy Black and White subjects. Due to the large number of paired samples a statistically significant difference was calculated between the biases (mean difference of SpO2-SaO2) obtained for Black and White subjects; however, this statistical finding is not relevant, as the numerical value is so small (0.15%, or approximately 1/7 of 1%) that it is not clinically significant. Furthermore, “occult hypoxemia,” as defined in the recent literature [3,4,5,6,7], did not occur in any Black subjects, and was present in only two data pairs from the White subject data pool. These two data pairs resulted in a 0.2% occult hypoxemia rate for White subjects. All accuracy and error values are well within manufacturer specifications.

Subjects who self-identified as Black had a Massey Scale median value of 6 (range 4–9), while those who self-identified as White had a Massey Scale median value of 2 (range 1–4), as shown in Table 1 and Fig. 1. Because individuals within a given race will have a range of skin pigmentation, it is expected that there will be a small number of individuals with similar levels of skin pigmentation in each race, as displayed in Fig. 1.

The absence of racial bias, and highly accurate overall performance exhibited by Masimo SET® pulse oximetry can be logically explained by Masimo’s engineering design and testing paradigm. When a Masimo SET® sensor is activated on a patient, the device adjusts the light intensity to optimize signal quality. This includes automatic adjustment of both light-emitting diode (LED) intensity and the receiver gain to compensate for differences in light absorbance. The light absorbance signal is then processed using Masimo’s “multiple parallel signal processing engine system” to calculate SpO2. Conventional pulse oximetry uses the standard red over infrared algorithm to provide SpO2, while Masimo SET® uses that conventional algorithm but has added four other algorithms that all run in parallel. These algorithms allow the distinction between arterial and venous signal during motion and low perfusion by identifying and isolating the non-arterial and venous noise SpO2 from the true arterial SpO2 components in the signal. These multiple signal processing engines work together to overcome limitations of each independent method. This advanced technique allows for a more accurate picture of the pulsatile (arterial) signal and significantly reduces the impact of static absorbers such as skin pigment and tissue thickness (e.g., finger, toe, or earlobe). Finally, the Masimo SET® SpO2 algorithm is calibrated and then validated using nearly equal numbers of dark and light-skinned subjects.

The recent retrospective studies analyzing the effect of skin pigmentation on pulse oximeter accuracy have had several common methodology limitations. First, most did not indicate pulse oximeter manufacturer, model, or sensor type used for data collection [3, 4, 6, 12,13,14,15], while the rare investigators who did list manufacturer(s), did not stratify their data by manufacturer [5,6,7]. Prior to this report, only two investigations studying racial differences in pulse oximetry identified the manufacturer(s) tested [16, 17]. It is important to emphasize that not all pulse oximeters are created equal with respect to skin color. In addition to significant technological differences, calibration and validation testing of pulse oximetry devices also vary between manufacturers. The U.S. FDA Guidance for medical-grade pulse oximeters requires a minimum of two subjects, or 15% of the study pool, to be dark-skinned during validation trials for 510(k) clearance. Masimo requires adherence to more rigorous pigmentation benchmark standards for calibration and validation studies, utilizing nearly equal numbers of dark-skinned and light-skinned subjects [8]. The rigor of this calibration and validation testing paradigm has helped ensure that Masimo SET® devices are accurate and reliable for all patients, regardless of race, ethnicity, or skin tone. This was demonstrated in a 2017 study comparing Masimo SET® (Radical-7®) pulse oximeters with another manufacturer’s devices in dark-skinned and light-skinned infants with hypoxemia [17]. These results showed an overall measurement bias of 0.8% for Masimo SET® compared to 3.9% for the other manufacturer’s device. Further, the bias difference between dark and light skinned infants was 1.4% for Masimo SET® compared to 2.4% for the other manufacturer’s device.

Other key methodological concerns with the recent retrospective clinical studies evaluating skin pigmentation and SpO2 accuracy include prolonged lag times between SpO2 and SaO2 readings, and absence of COHb and MetHb measurements. All of the recent studies allowed SpO2 measurements to be taken up to 5 min, or even 10 min of the SaO2 sample [3,4,5,6,7, 12, 13, 15, 16]. Per ISO 80601-2-61 standard, delays > 30 s are not considered current. It is widely recognized that blood oxygen saturation values can change significantly within this timeframe in patients [18]. In addition, most of these studies did not evaluate the impact of COHb and MetHb on SpO2 readings [4,5,6,7, 12, 14,15,16]. Elevated COHb and MetHb values are known to alter SpO2 values when measured using conventional pulse oximetry [19]. Furthermore, several of these studies evaluated pulse oximeter accuracy in COVID-19 patients [4, 6, 7, 16], and recent investigations demonstrate that COVID-19 patients can harbor significantly elevated endogenous COHb and MetHb values [20].

There are two notable limitations of our study. First, the retrospective data were collected from healthy subjects using a controlled laboratory desaturation protocol; thus, common clinical challenges to SpO2 readings present in critically ill patients (e.g., changes in breathing, anemia, abnormal extremity perfusion, etc.) were not present. However, this limitation should also be viewed as a strength of the study, as the elimination of known confounders facilitates greater focus on the evaluation of skin pigmentation. Furthermore, it should be recognized that desaturation studies can only be ethically performed on healthy subjects in a controlled laboratory setting. Second, this study is focused on the Black and White populations and does not evaluate other ethnic groups (i.e., Asian, Hispanic); however, this was done to maximize the contrast in skin pigmentation.

Future clinical studies on racial differences between SpO2-SaO2 measurements and the frequency of occult hypoxemia should examine critically ill patients using a prospective protocol that specifies the pulse oximeter manufacturer(s), model(s), and sensor(s). These studies should also include an objective measurement of skin pigmentation, such as the Massey Scale, to stratify the results by skin tone. In addition, SpO2 values should be recorded simultaneously with the SaO2 blood samples. Known pulse oximeter confounders, including elevated COHb and MetHb, should be excluded or evaluated in sub-group analyses. Finally, patient clinical status and drug history should also be recorded for sub-group analysis.

In conclusion, this retrospective study of healthy human volunteers monitored with Masimo RD SET® pulse oximeter sensors, showed an absence of clinically significant differences in accuracy between Black and White subjects. Furthermore, occult hypoxemia did not occur in Black subjects, and the occurrence of this finding was rare in White subjects. Prospective clinical studies are needed to validate these results in critically ill patients utilizing Masimo SET® pulse oximeters and other devices, stratified by manufacturer.