Pulse oximeter performance is degraded by motion artifacts and low perfusion. Manufacturers developed algorithms to improve instrument performance during these challenges. There have been no independent comparisons of these devices.
We evaluated the performance of four pulse oximeters (Masimo Radical-7, USA; Nihon Kohden OxyPal Neo, Japan; Nellcor N-600, USA; and Philips Intellivue MP5, USA) in 10 healthy adult volunteers. Three motions were evaluated: tapping, pseudorandom, and volunteer-generated rubbing, adjusted to produce photoplethsmogram disturbance similar to arterial pulsation amplitude. During motion, inspired gases were adjusted to achieve stable target plateaus of arterial oxygen saturation (SaO2) at 75%, 88%, and 100%. Pulse oximeter readings were compared with simultaneous arterial blood samples to calculate bias (oxygen saturation measured by pulse oximetry [SpO2] − SaO2), mean, SD, 95% limits of agreement, and root mean square error. Receiver operating characteristic curves were determined to detect mild (SaO2 < 90%) and severe (SaO2 < 80%) hypoxemia.
Pulse oximeter readings corresponding to 190 blood samples were analyzed. All oximeters detected hypoxia but motion and low perfusion degraded performance. Three of four oximeters (Masimo, Nellcor, and Philips) had root mean square error greater than 3% for SaO2 70 to 100% during any motion, compared to a root mean square error of 1.8% for the stationary control. A low perfusion index increased error.
All oximeters detected hypoxemia during motion and low-perfusion conditions, but motion impaired performance at all ranges, with less accuracy at lower SaO2. Lower perfusion degraded performance in all but the Nihon Kohden instrument. We conclude that different types of pulse oximeters can be similarly effective in preserving sensitivity to clinically relevant hypoxia.
Pulse oximeter performance is degraded by motion artifacts and low perfusion. Manufacturers developed algorithms to improve instrument performance during these challenges.
There have been no independent comparisons of these devices.
This study determined the performance of four pulse oximeters (Masimo Radical-7, USA; Nihon Kohden OxyPal Neo, Japan; Nellcor N-600, USA; and Philips Intellivue MP5, USA) in 10 healthy adult volunteers.
All oximeters detected hypoxemia during motion and low-perfusion conditions, but motion impaired performance at all ranges, with less accuracy at lower arterial oxygen saturation. Lower perfusion degraded performance in all but the Nihon Kohden instrument.
PULSE oximetry is a noninvasive technology for continuous monitoring of arterial oxygen saturation (Sao2), and has been a standard tool used to assess oxygenation and respiratory function in patients.1–3 Pulse oximeters transmit red and near-infrared light across a tissue bed (e.g., finger, toe, or earlobe) and detect changes in light absorbance to calculate estimated Sao2 based on photoplethysmography.2 Healthcare providers rely on readings of oxygen saturation measured by pulse oximetry (Spo2) to provide accurate measurements of oxygen saturation, especially in the detection of hypoxemia associated with deterioration of respiratory function.2
Patient movement and low perfusion to extremities can generate artifacts that reduce the accuracy of Spo2 readings.1 These limitations are particularly relevant in clinical settings like obstetric units (e.g., laboring women or women with neuraxial anesthesia undergoing cesarean delivery), intensive care unit and postanesthesia care unit, where patients experience voluntary and involuntary movement, including tapping, rubbing, shivering, and seizures in adult and pediatric patients, and kicking and crying in neonates.4 Motion artifacts can result in low signal-to-noise ratios and underestimation of Sao2 due to venous blood motion. Errors can be exacerbated by low perfusion.5 Furthermore, these limitations can obscure true signals, triggering false alarms that may lead to ignored true alarms—all of which can compromise patient safety and increase cost of care.6,7
Over the years, manufacturers of pulse oximeters have developed software algorithms to reduce motion artifacts; the devices containing such software are marketed as motion tolerant/resistant.8 Other groups have studied and compared the performances of various “new generation” pulse oximeters under simulated motion and low perfusion. To date, these studies have mainly compared Spo2 readings from volunteer-initiated hand motion to readings of the volunteer’s nonmotion hand as the only reference for true Sao2 value.8–14 The U.S. Food and Drug Administration (FDA; Silver Spring, Maryland), however, requires that manufacturers validate accuracy specifications of pulse oximeters by comparing each Spo2 value with a “gold standard” Sao2 measurement collected by simultaneous co-oximetry of an arterial blood sample.15 Similar recommendations exist for validating new generation devices claiming resistance to motion and low perfusion artifacts. While the FDA has a specification of root mean square error less than or equal to 3.0% for pulse oximeter accuracy testing across a 70 to 100% Sao2 range, there is currently no numerical standard for motion and low perfusion conditions.
In this study we evaluated four commercially available motion- and low perfusion–tolerant pulse oximeters during different types of controlled movement and a range of perfusion. Consistent with FDA guidelines, Spo2 readings from each device were compared with Sao2 measurements from simultaneous arterial blood sampling.
Materials and Methods
This study was approved by the University of California, San Francisco Committee on Human Research (San Francisco, California). Written informed consent was obtained from all subjects. Ten healthy adult subjects participated in the study: six men and four women, all healthy, who ranged in age between 22 and 44 yr (median age, 27 yr). Skin tone varied to include 10% light, 40% light/medium, 30% medium, 20% medium/dark, and no dark skin tone. Ethnicity varied to include 60% Asian, 30% Caucasian, 10% Latino, and no African American. All subjects were nonsmokers with no evidence of lung disease, obesity, or cardiovascular problems. Sample size was based on FDA guidelines3 for accuracy testing that required at least 200 data points balanced across each decadal range (70 to 80%, 80 to 90%, and 90 to 100%) of Sao2 range from 70 to 100%. Subjects were studied using the identical protocol implemented by our laboratory that routinely tests pulse oximeters for FDA 501(k) certification.16
Subjects were placed in the semirecumbent position (30° head up) with a nose clip and breathed air–nitrogen–carbon dioxide mixtures through a mouthpiece from a partial rebreathing circuit with a voluntarily increased minute ventilation and 10 to 20 l/min fresh gas inflow, but with carbon dioxide added as needed to maintain normocapnia. The control hand was placed on a stationary arm rest at elbow level, where an indwelling 22-gauge radial artery catheter was placed to sample arterial blood for measurement of Sao2. The opposite motion hand rested securely on a motorized table that generated repeatable and continuous vertical hand movements while the elbow remained fixed and the fingertips either tapped or rubbed on a smooth surface.
Each subject was monitored with four pulse oximeters on the test hand: Masimo Radical-7 with SET software V220.127.116.11 (USA), Nellcor OxyMax N-600 with software version 18.104.22.168 (USA), Nihon Kohden OxyPal Neo with software Ver79-06, 98-16 (Japan), and Philips Intellivue MP5 with software L.10.75 (USA). Disposable adhesive sensors were used to prevent sensor displacement and were randomly assigned to digits 2 to 5 on every subject’s test hand. A reference oximeter (Masimo Radical-7) was mounted randomly to a digit on the control hand using a reusable sensor, and baseline perfusion index (PI) was recorded on that hand.
Three motion modalities were tested: tapping, random, and rubbing. The motion table was programmed to perform (1) machine-generated tapping at a fixed amplitude of ±3 cm and frequency of 2 Hz; (2) machine-generated aperiodic random motion with maximal amplitude and frequency at ±3 cm and 2 Hz, respectively; or (3) volunteer-generated 2-Hz sideways rubbing with the aid of a metronome. Each of these movements produced significant photoplethysmogram disturbance artifacts that were adjusted to be 100 ± 25% of the amplitude of arterial pulsations. We made no efforts to control the perfusion state of the subjects; however, all subjects were well hydrated and the room temperature was about 28°C.
A series of three stable target Sao2 plateaus between 70 and 100% (approximately 73%, 88%, and 98%) were targeted by the operator, who adjusted the inspired air–nitrogen–carbon dioxide mixture breath-by-breath to achieve the desired saturation. This was done by using end-tidal carbon dioxide and oxygen analysis (Applied Electrochemistry oxygen and carbon dioxide analyzers, USA) and a computer algorithm (LabVIEW 2013, National Instruments, USA) that involves a model oxyhemoglobin dissociation curve and inputted values for hemoglobin P50, arterial-alveolar gas gradients, and base excess.17
At each level, arterial blood was sampled after a stable plateau of 30 to 60 s had been achieved, followed by a second sample at the same plateau 30 s later. Parallel Spo2 readings from all tested oximeters were recorded by hand throughout the protocol. Functional arterial Sao2 (Hbo2/[Hb + Hbo2]) was determined by multiwavelength oximetry using a Radiometer ABL-90 (Denmark), which was calibrated according to manufacturer recommendations.
Specific power calculations for a study involving a mixed-effects model are complex and were not undertaken. However, based on our previous published pulse oximeter performance studies using a repeated-measures design and a 10-subject pool, we have found statistically significant differences in pulse oximeter performance that are smaller than clinically relevant effects using same subject size.18 Additionally, a study of 10 subjects conforms to the FDA’s guidelines for study design related to claims for pulse oximetry motion performance.3 Further, an unpublished pilot study in our laboratory was performed with 10 subjects using an identical motion protocol, and results revealed power to discriminate differences of 5% in missed readings. The study was not powered to examine differences in sex, skin tone, and ethnicity, but used a subject pool balanced in these factors according to FDA requirements.
Bias was calculated as Spo2 minus Sao2 from each oximeter’s value and the corresponding arterial blood value. Bias is summarized as mean ± SD, where the SD is considered the precision. Root mean square error was calculated as the square root of the mean difference between Spo2 − Sao2, squared. The 95% confidence limits of the root mean square error were determined using bootstrapping (random resampling with replacement) with 50,000 repetitions. The 95% limits of agreement (LOA) were calculated as 1.96 · SD according to Bland and Altman with adjustments for multiple measurements for each individual according to the “Method Where the True Value Varies.”19 The 95% confidence limits for the LOA were determined using bootstrapping as above.
Bias and the absolute value of the bias under different motions and different ranges of Sao2 were compared using repeated-measures ANOVA with Tukey-Kramer honestly significant difference for multiple comparisons. Levene’s test was used to compare variances between the different motions and Sao2 ranges.
Receiver operating character (ROC) curves were constructed for each oximeter’s Spo2 determination of hypoxia (defined as Sao2 < 90%) and severe hypoxia (defined as Sao2 < 80%). The area under the curve (AUC) and 95% CIs were calculated. A Fisher exact test was used to compare the incidence of low perfusion (PI < 2) between male and female subjects. A P value less than 0.05 was considered statistically significant. Statistical analysis was performed with JMP 11.0 (SAS Institute, USA) and Stata 14 (Statacorp, USA).
We obtained data from 10 healthy adult subjects who participated in the study: six men and four women, all healthy, who ranged in age between 22 and 44 yr (median age, 27 yr). Each of the ten subjects had four motion readings from four pulse oximeters subjected to motion and one stationary pulse oximeter recorded at the time of acquiring 190 blood samples.
Comparison of Bias (Spo2 − Sao2) in Different Motion Modalities
Table 1 represents data comparing the performance of four pulse oximeters during different modalities of motion (tapping, rubbing, and random). For tapping motion, all pulse oximeters had significantly higher absolute mean bias than the non–motion control oximeter. For rubbing motion, all oximeters had significantly higher absolute biases except Nihon Kohden. In random motion, only Masimo displayed significantly higher absolute biases than the reference.
At any Sao2 level, all pulse oximeters except Nellcor demonstrated higher root mean square errors than the corresponding nonmotion reference results during each of the three motion tests. Nellcor exhibited lower root mean square errors than the reference during random movement. When all motions were analyzed as a whole without concentrating on any particular motion type, all pulse oximeters had higher root mean square errors than the stationary control (root mean square errors of 1.8%; 95% CI, 1.55 to 2.01), with Nihon Kohden’s root mean square errors of 2.2% being the lowest among the motion-tested machines.
When comparing performance between the three tested motion modalities, no significant differences in mean absolute bias or precision were found for Masimo and Nihon Kohden, whereas Nellcor and Philips demonstrated significant differences for both these performance parameters. Nihon Kohden missed one reading during rubbing, and Philips missed a number of readings, in all types of motion tests.
Comparison of Bias (Spo2 − Sao2) in Different Ranges of Sao2
Table 2 compares pulse oximetry performance during any motion across “decadal” Sao2 ranges of 70 to 80%, 80 to 90%, and 90 to 100%, but with the four Sao2 plateaus from 68.5% reported with the 70 to 80% range. The introduction of motion caused all four oximeters to produce significantly higher mean absolute bias at the severely hypoxic range (68.5 to 80% Sao2), whereas moderate hypoxia (80 to 90% Sao2) combined with motion increased bias for all oximeters except Nihon Kohden. In the normoxic range (90 to 100% Sao2), Masimo and Nellcor both reported significantly higher mean absolute bias when motion was present. The SD of the bias (precision) increased for all instruments, except Nihon Kohden at 90 to 100%. For every oximeter tested in motion, root mean square errors were higher than the nonmotion reference for all instruments at every Sao2 range.
Successively lower Sao2 ranges significantly degraded performance (increased mean absolute bias and root mean square errors and decreased precision) for the nonmotion reference and three of four motion-tested oximeters. Different oxygen saturation ranges did not significantly affect the mean absolute bias or precision for Nellcor, and its measured root mean error values were higher at higher Sao2 ranges.
For each pulse oximeter tested, the biases from each motion test were plotted against Sao2 using a modified version of the Bland-Altman method20 (fig. 1). During motion, bias values deviated further from zero as Sao2 declined. The three motion modalities of tapping, rubbing, and random impacted the distribution of bias values differently for each pulse oximeter, as reflected by the corresponding upper and lower LOA.
Comparison of Bias (Spo2 − Sao2) during Low- and Well-perfused States
Table 3 compares the performance of the tested oximeters in different states of blood perfusion. We used a cutoff of PI less than 2.0 to represent poor perfusion.21 Generally, imprecisions are significantly higher with poor perfusion for all the motion-tested pulse oximeters and the nonmotion reference. However, Nihon Kohden during motion did not demonstrate significant differences in precision or mean absolute bias between perfusion index values less than and above 2. Along with the non–motion control, both Masimo and Philips showed significantly different mean absolute bias between the two perfusion categories during motion. Along with one of the male subjects, all four female subjects had significantly reduced poor perfusion (P = 0.048; fig. 2). LOAs were further from zero bias when perfusion index was less than 2 (fig. 3).
Sensitivity and Specificity
The AUC was at least 95% in detecting hypoxemia (Sao2 < 90%) or severe hypoxemia (Sao2 < 80%) for every pulse oximeter tested during any motion with all motion types combined (fig. 4). The AUC for the nonmotion reference was 100%. During just the tapping motion only, the AUC ranged from 86.2% (Philips at severe hypoxemia) to 100% (data not shown). During rubbing motion, the AUC ranged from 92.7% (Masimo at severe hypoxemia) to 100%. And during random motion, the AUC ranged from 93.7% (Masimo at severe hypoxemia) to 100%.
During hypoxia, AUC ranged from 93.7% (Nellcor) to 100% at low perfusion (PI < 2), and 98.4% (Philips) to 100% at high perfusion (PI ≥ 2; data not shown). During severe hypoxia, AUC ranged from 90.9% (Masimo) to 100 for poor perfusion, and 97.8% (Nellcor) to 100% with good perfusion.
This is the first independent examination of the comparative performance of motion-tolerant and low perfusion–resistant pulse oximeters. Previous studies concerning this type of assessment were industry supported, restricted to a single device, or did not involve a comparison to measured saturation in arterial blood samples. Our findings indicate that four motion-tolerant low perfusion–resistant pulse oximeters behaved similarly during controlled motion and hypoxia. All devices had at least a 95% sensitivity and specificity in detecting hypoxemia (Sao2 ~ 88%) and severe hypoxemia (Sao2 ~ 78%) during motion. Although there were some differences in the precision of the four instruments, all instruments detected hypoxia during motion and across a range of perfusion conditions in healthy subjects under controlled laboratory conditions (ROC curves, fig. 4). Analyzing pulse oximeter performance using ROC curves reflects a clinically relevant approach. An important purpose of pulse oximetry is to determine whether a patient is normoxic or hypoxic. Knowing exactly accurate oxygen saturation in a moving or critically ill patient may not be the most important clinical question in many circumstances. The analysis demonstrates excellent sensitivity and specificity of all the devices during motion. While there were some high Spo2 readings during hypoxemia (false negatives), which decreased sensitivity, it is also possible that the oximeters would have produced low Spo2 readings if given longer processing time.
Relevance of Laboratory Motion and Low Perfusion Challenges to Clinical Conditions
This study examined controlled motion in a laboratory setting. In terms of reproducibility and between-instrument comparisons, this study design has advantages compared to studies in spontaneously moving patients. However, the clinical applicability of the findings is less clear than in studies involving spontaneous motion in patients. Our study did involve significant disturbances in the photoplethysmogram, as it produced decreased precision in all oximeters tested. The magnitude of the movement disturbances was adjusted in each subject to produce motion artifacts in the photoplethysmogram signal that were similar in amplitude to the arterial pulse waveforms. Therefore, each oximeter was challenged to reject pulsatile signals that did not correspond to actual arterial pulsations. How this is achieved by the different devices is not known to the user, and are both trade secrets and patent protected. It was interesting that, despite probable differences in the processing software and hardware in the different instruments, all four types of instruments performed similarly well in detecting the changes in saturation between room air and 88% saturation and a change between 88% and 77% saturation. We see no reason why this would be different in clinical situations involving desaturations in patients during movement disturbances.
Effect of Different Motion Types on Performance
We analyzed three different motion types for each of the four devices and found that motion increased mean bias and root mean square errors for all pulse oximeters. Some small differences in the performance of the oximeters during different types of motion were observed, but an obvious pattern was not apparent. This was surprising because we assumed that the instruments use different types of algorithms to determine saturation. The root mean square error was more than 3% for many of the motions in all devices with the exception of Nihon Kohden. Although Philips’s root mean square error was under 3.0% for the pseudorandom motion, there were many incidents of failure to display a saturation value (“Dropped,” table 1). Our motion protocol involved two types of repetitive hand motion that were employed in previous studies of pulse oximeter performance under motion and low perfusion conditions.9,14
We did not specifically design the study to examine the effects of low perfusion on pulse oximeter performance. However, variations in perfusion index in our subjects enabled us to examine the role of this variable in the precision of the instruments. Several previous studies have reported that low perfusion degrades pulse oximeter performance and results in nondisplayed saturation values. As far as we are aware, the only previous study quantitatively examining the relationship of decreased perfusion index to increased bias was the study by Hummler et al.21 In the Hummler et al. study, bias increased when PI was less than 2. We defined a perfusion index of less than 2 as low perfusion; this cutoff is supported by the large decrease in precision in readings at these lower PI values (fig. 3). As has been our experience with many years of pulse oximetry testing, our female subjects were more likely to have a low perfusion index (less than 2%). Low perfusion was associated with increasing mean absolute bias and decreased precision (table 3). This is an important association because it implies that sex affects pulse oximeter performance through the mechanism of low perfusion. Although it is possible to heat the hands with heating pads or warm air, it may not always be practical to do so in patients.
Our study had several limitations. The first was that, of necessity in a controlled laboratory setting, controlled motion was used. The motions may not reflect they types of motion that present challenges to pulse oximeter performance in patients, and therefore the extrapolation of our findings to clinical conditions is limited. However, our motion protocols created a substantial disturbance to the photoplethysmograph waveform, one that was of the same magnitude as the pulse signals. In addition, our motion controller software permitted inclusion of a “pseudorandom motion” protocol, wherein motion is randomly jerky, of varying amplitude and direction within specified extremes. This may be similar to some types of patient motion. Based on our knowledge of how motion-resistant algorithms work, our motion disturbances were expected to be a significant challenge to the instruments. A study in actively moving patients involving a blinded comparison of different types of motion tolerant/resistant pulse oximeters is needed.
Another limitation was that the study did not encompass a full range of perfusion values in all subject types; the subjects with low perfusion were predominately female. Therefore, the generalizability of our conclusion that low perfusion performance was similar among all devices is limited. Our protocol also maintained plateaus for defined periods of time. It is possible that some of the pulse oximeters might have determined more accurate and precise readings if provided a longer period of processing time. We did not include a nonmotion resistant pulse oximeter on the motion hand as control. This was not practical because we did not want an instrument on the thumb. As mentioned, even the motion-resistant oximeters examined showed decreased performance in our study, so we do not believe that this was a significant limitation. Our study was not powered to specifically examine differences in skin tone, ethnicity, and sex, as these are not repeated-measures tests and would require a significantly larger sample size. The subject diversity was designed only to follow FDA guidelines regarding subject competition. The study was also not specifically designed to compare one oximeter to the other, and differences could have been influenced by other factors.
Another study limitation is that the study may not have been powered to identify differences between oximeter performance during motion, even though the study was designed with a sample size sufficient to detect statistical, but perhaps not clinically relevant, differences in oximeter performance in detecting changes in oxygenation.18 It seems unlikely to us that the small differences in performances observed in the study have any clinical significance.
Motion and low perfusion degraded the performance of four types of pulse oximeters that are marketed as motion-resistant devices, but all four types tested detected hypoxia with greater than 95% specificity. Low perfusion was associated with less precision. We conclude that different types of algorithms to read through motion and low perfusion are similarly capable of detecting significant changes in oxygenation under controlled laboratory conditions.
This study was sponsored by the University of California, San Francisco Hypoxia Research Laboratory (San Francisco, California) using unrestricted funds derived from the testing of medical devices. No manufacturer directly funded the study or was involved in study design or data analysis.
Drs. Feiner, Bickler, and Lucero have previously funded independent university-based research with funds collected from companies who contracted with our lab to perform evaluation of instrument performance. In the last five years the following companies contracted with the Hypoxia lab: Masimo (Irvine, California), Nonin (Plymouth, Minnesota), Bluepoint Medical (Centreville, Virginia), Xhale (Wales, England) Nihon Kohden (Tokyo, Japan), Motara (Milwaukee, Wisconsin), Medico (Tokyo, Japan), Philips (Amsterdam, The Netherlands), Renu (Everett, Washington), Scanadu (Sunnyvale, California), Sensepc (Rostock, Germany), Sentec (Therwil, Switzerland), Sharp (Osaka, Japan) Sleep Med (Peabody, Massachusetts), Solaris (San Francisco, California), Springer (Berlin, Germany), True Wearables (Rancho Santa Margarita, California), Unimed (Shenzhen, China), Verifood (Vancouver, British Columbia), Vios (Oakdale, Minnesota), Zensorium (Singapore), CAS Medica (Branford, Connecticut). Funds from this work supported the research reported in this study. However, no manufacturer directly funded the study or was involved in study design, data analysis, or any aspect of manuscript writing or editing.